Commit Graph

297 Commits

Author SHA1 Message Date
avogar
68bb07d166 Better naming 2022-05-13 18:39:19 +00:00
avogar
b17fec659a Improve performance and memory usage for select of subset of columns for some formats 2022-05-13 13:51:28 +00:00
Kruglov Pavel
77e55c344c
Merge pull request #36667 from Avogar/mysqldump-format
Add MySQLDump input format
2022-05-04 19:49:48 +02:00
Dmitry Novik
5ba7a55c18
Merge pull request #36650 from bigo-sg/hive_text_parallel_parsing
Parallel parsing of hive text format
2022-05-03 15:56:28 +02:00
Kruglov Pavel
d613f7eab0
Merge branch 'master' into mysqldump-format 2022-05-02 13:31:57 +02:00
Jakub Kuklis
a1f2dd6d34 Adding two settings in place of one, improvements to the test clarity 2022-04-29 10:01:51 +02:00
Jakub Kuklis
507ba1042c Adding a setting to enable Google wrappers special treatment 2022-04-29 10:01:51 +02:00
avogar
d295de1689 Fix comments and test 2022-04-28 14:59:35 +00:00
Kruglov Pavel
4d08587559
Merge branch 'master' into mysqldump-format 2022-04-28 15:58:18 +02:00
avogar
33d845dade Add MySQLDump input format 2022-04-26 10:42:56 +00:00
taiyang-li
b7cc344d62 remove useless codes 2022-04-26 14:42:43 +08:00
taiyang-li
99dee35b6e parallel parsing of hive text format 2022-04-26 14:33:10 +08:00
avogar
1c065f8c7a Some refactoring around schema inference with globs 2022-04-13 17:02:48 +00:00
avogar
d2017a63b1 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference 2022-04-07 11:36:40 +00:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference 2022-04-06 13:51:07 +02:00
Maksim Kita
371cdc956a Added input format settings for parsing invalid IPv4, IPv6 addresses as default values 2022-03-30 12:54:19 +02:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference 2022-03-25 12:05:40 +01:00
avogar
557edbd172 Add some improvements and fixes in schema inference 2022-03-24 12:54:12 +00:00
Antonio Andelic
0c23cd7b94 Add support for case insensitive column matching in arrow 2022-03-22 10:55:10 +00:00
Antonio Andelic
f75b054255 Allow case insensitive column matching 2022-03-21 07:47:37 +00:00
Antonio Andelic
607f785e48 Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
This reverts commit ebf72bf61d, reversing
changes made to f1b812bdc1.
2022-03-17 12:31:43 +00:00
shuchaome
46cb4483a6 Optimise by lowering schema on the beginning. Add a functional test. 2022-03-11 14:34:46 +08:00
shuchaome
56795b831d add setting to lower column case when reading parquet/orc file 2022-03-09 16:07:02 +08:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
avogar
77b42bb9ff Support UUID in MsgPack format 2022-02-07 17:11:44 +03:00
Kruglov Pavel
7873b4475f
Merge branch 'master' into autodetect-format 2022-01-25 10:56:52 +03:00
avogar
a6740d2f9a Detect format and schema for stdin in clickhouse-local 2022-01-25 10:25:37 +03:00
avogar
1f49acc164 Better naming 2022-01-24 16:28:36 +03:00
Kruglov Pavel
a7df9cd53a
Merge branch 'master' into formats-with-suffixes 2022-01-14 21:03:49 +03:00
avogar
253035a5df Fix 2022-01-14 19:17:06 +03:00
Kruglov Pavel
d2e9f37bee
Merge branch 'master' into format-by-extention 2022-01-14 18:36:23 +03:00
avogar
89a181bd19 Make better 2022-01-14 18:16:18 +03:00
avogar
817a314263 Fix tests and style 2022-01-14 17:46:24 +03:00
Kruglov Pavel
5a908e8edd
Merge branch 'master' into formats-with-suffixes 2022-01-14 16:45:20 +03:00
Kseniia Sumarokova
5da673c3a5
Merge pull request #31104 from bigo-sg/hive_table
Implement hive table engine
2022-01-14 09:39:17 +03:00
avogar
2d7b1bfa5e Detect format in S3/HDFS/URL table engines 2022-01-13 16:14:18 +03:00
Kruglov Pavel
305d58a762
Merge pull request #33524 from Avogar/stacktrace-in-client
Don't print exception twice in client in case of exception in parallel parsing
2022-01-13 15:50:42 +03:00
avogar
8390e9ad60 Detect format by file name in file/hdfs/s3/url table functions 2022-01-12 18:29:31 +03:00
taiyang-li
66813a3aa9 merge master 2022-01-12 16:56:29 +08:00
avogar
0ae0aa712b Don't print exception twice in client in case of exception in parallel parsing 2022-01-11 18:37:07 +03:00
zhongyuankai
99279c1443 INTO OUTFILE / FROM INFILE: autodetect FORMAT by file extension 2022-01-11 21:26:14 +08:00
zhongyuankai
878e44eb97 auto format by file extension 2022-01-08 21:47:14 +08:00
taiyang-li
1e102bc1b2 merge master 2022-01-01 09:01:06 +08:00
alesapin
16c36d72b1
Merge pull request #33296 from ClickHouse/fix_clang_tidy_3
Fix clang tidy 3
2021-12-29 22:43:42 +03:00
avogar
97788b9c21 Allow to create new files on insert for File/S3/HDFS engines 2021-12-29 21:19:13 +03:00
Kruglov Pavel
489a30859f
Merge pull request #32455 from Avogar/schema-inference
Automatic schema inference for input formats
2021-12-29 21:03:48 +03:00
alesapin
34145c47da Fix clang tidy 2021-12-29 18:36:42 +03:00
avogar
8112a71233 Implement schema inference for most input formats 2021-12-29 12:18:56 +03:00
taiyang-li
9036b18c2f merge master 2021-12-27 15:12:48 +08:00
Raúl Marín
cb22091b33 Merge remote-tracking branch 'blessed/master' into kill_scalar_github 2021-12-23 13:59:33 +01:00
Raúl Marín
9bb88c26d8 Add existing progress to the record of the output format progress 2021-12-22 23:14:23 +01:00
taiyang-li
2597925724 merge master 2021-12-21 15:55:39 +08:00
avogar
ba6a513db0 Fix tuple output in CSV format 2021-12-20 19:27:09 +03:00
kreuzerkrieg
f06c37d206 Stop reading incomplete stripes and skip rows. 2021-12-19 18:41:32 +02:00
李扬
8675086104
Merge branch 'master' into hive_table 2021-12-12 09:01:46 -06:00
Kseniia Sumarokova
926fd568c7
Merge pull request #32113 from FrankChen021/url_http_header
Set Content-Type in HTTP packets issued from URL engine
2021-12-07 08:52:36 +03:00
Kseniia Sumarokova
eab6f0ba49
Update FormatFactory.cpp 2021-12-06 23:35:29 +03:00
frank chen
c319b6fa32 Fix style
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 22:09:04 +08:00
frank chen
898db5b468 Resolve review comments
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 19:47:05 +08:00
taiyang-li
9ec8272186 refactor hive text input format 2021-12-02 16:14:25 +08:00
mergify[bot]
8d5460b469
Merge branch 'master' into feature-support-bool-type 2021-11-29 11:50:18 +00:00
MaxWk
f17d5b02e4 use bool representation 2021-11-19 14:30:22 +08:00
MaxWk
d42a454837 support some bool format 2021-11-11 16:01:32 +08:00
kssenii
ec11179f91 Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers 2021-11-03 14:33:31 +03:00
kssenii
45ea820297 Reduce memory usage for some formats 2021-11-03 14:30:03 +03:00
Kruglov Pavel
327a34e9da
Merge pull request #30497 from Avogar/null-deserialization
Add custom null representation support for TSV/CSV input formats, fix Nullable(String) deserializing in some formats
2021-11-03 11:30:25 +03:00
Kruglov Pavel
901ebcede6
Merge pull request #30351 from arenadata/ADQM-335
output_format_avro_rows_in_file
2021-11-02 12:25:27 +03:00
Kruglov Pavel
1f8535c02b
Merge branch 'master' into null-deserialization 2021-11-02 12:15:21 +03:00
avogar
d1ef96a5ef Add test, avoid unnecessary allocations, use PeekableReadBuffer only in corner case 2021-10-27 17:29:15 +03:00
avogar
d5c5a3213b Add custom null representation support for TSV/CSV input formats, fix bugs in deserializing NULLs in some cases 2021-10-21 16:52:27 +03:00
Kruglov Pavel
5052ec3ab0
Merge branch 'master' into tsv-csv 2021-10-19 12:03:52 +03:00
Kruglov Pavel
1e2ceeb2e7
Merge pull request #29291 from Avogar/capnproto
Add CapnProto output format, refactor CapnProto input format
2021-10-19 11:54:55 +03:00
Ilya Golshtein
d90302aa3b output_format_avro_rows_in_file 2021-10-18 19:01:06 +03:00
Kruglov Pavel
dbc2f3408e
Merge branch 'master' into tsv-csv 2021-10-18 14:38:22 +03:00
Azat Khuzhin
50231460af Use forward declaration for Buffer<> in generic headers
- changes in ReadHelpers.h -- recompiles 1000 modules
- changes in FormatFactor.h -- recompiles 100 modules
2021-10-16 12:03:24 +03:00
avogar
ce22f534c4 Add CapnProto output format, refactor CapnProto input format 2021-10-14 16:43:22 +03:00
avogar
324dfd4f81 Refactor and improve TSV, CSV and JSONCompactEachRow formats, fix some bugs in formats 2021-10-14 13:32:49 +03:00
Nikolai Kochetov
ab28c6c855 Remove BlockInputStream interfaces. 2021-10-14 13:25:43 +03:00
Nikolai Kochetov
2957971ee3 Remove some last streams. 2021-10-13 21:22:02 +03:00
Nikolai Kochetov
ec18340351 Remove streams from formats. 2021-10-11 19:11:50 +03:00
Nikolai Kochetov
c6bce1a4cf Update Native. 2021-10-08 20:21:19 +03:00
Anton Popov
ea4fd19e28
Merge pull request #29087 from CurtizJ/asyn-inserts-follow-up
Minor enhancements in async inserts
2021-09-21 13:38:52 +03:00
PHO
3c4b1ea9c5 New setting: output_format_csv_null_representation
This is the same as output_format_tsv_null_representation but is for CSV output.
2021-09-17 17:58:23 +09:00
Anton Popov
99175f7acc minor enhancements in async inserts 2021-09-16 20:55:34 +03:00
mergify[bot]
401b2f3b8f
Merge branch 'master' into json-as-string 2021-08-26 15:03:59 +00:00
Alexey Milovidov
8adaef7c8e Make text format for Decimal tuneable 2021-08-16 11:03:23 +03:00
alexey-milovidov
36ab47769b
Merge pull request #27609 from Algunenano/refactor_mysql_format
Refactor mysql format check
2021-08-13 03:02:49 +03:00
mergify[bot]
38d97ec52a
Merge branch 'master' into json-as-string 2021-08-12 17:18:38 +00:00
Nikita Mikhaylov
8c06abee73
Merge pull request #25902 from Avogar/arrow-nested
Refactor ArrowColumnToCHColumn, support inserting Nested as Array(Struct) in Arrow/ORC/Parquet
2021-08-12 20:02:01 +03:00
Raúl Marín
a451bf6eac Remove unused code 2021-08-12 11:30:01 +02:00
Raúl Marín
f6788fc660 Mysql handler: Move format check to the handler 2021-08-12 11:29:50 +02:00
Raúl Marín
65bb4ff744 Unify mysql output format checks 2021-08-09 14:29:35 +02:00
Raúl Marín
b1ff4ca81a Fix 01176_mysql_client_interactive and work with mariadb client 2021-08-06 18:03:27 +02:00
mergify[bot]
3201d90105
Merge branch 'master' into json-as-string 2021-08-05 14:18:35 +00:00
Pavel Kruglov
e4c5d7e3b1 Support inserting nested as Array of structs, add some refactoring 2021-08-05 14:10:27 +03:00
Vitaly Baranov
4f1926550b
Merge pull request #26429 from vitlibar/remove-mysql-wire-context
Remove MySQLWireContext
2021-07-19 12:21:24 +03:00
Alexey Milovidov
c648e8356b Remove even more code 2021-07-17 21:58:51 +03:00
Vitaly Baranov
0f8b196682 Remove MySQLWireContext. 2021-07-16 22:21:20 +03:00
Ilya Golshtein
16532658c2 Avro string for ClickHouse string 2021-07-13 20:03:00 +03:00
Alexander Tokmakov
1a470fb777 fix sequence_id in MySQL protocol 2021-07-07 20:03:28 +03:00
Pavel Kruglov
92e6df7b89 Allow data in square brackets in JSONAsString format 2021-06-23 16:17:34 +03:00
Pavel Kruglov
c8b37977da Fix bugs, support dictionary for Arrow format 2021-06-15 16:15:27 +03:00
Nikolai Kochetov
dbaa6ffc62 Rename ContextConstPtr to ContextPtr. 2021-06-01 15:20:52 +03:00
Alexander Kuzmenkov
35459a0228
Update FormatFactory.cpp 2021-04-22 21:48:06 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Alexander Kuzmenkov
e44b3822e3
Merge pull request #21850 from fastio/handle_errors_for_kafka_engine
Handle errors for Kafka engine
2021-04-09 22:59:40 +03:00
alexey-milovidov
5d672d4529 Update FormatFactory.cpp 2021-04-06 22:23:16 +03:00
Nikita Mikhailov
37f48d13b4 add test 2021-04-06 22:23:16 +03:00
Peng Jian
26b5482b4d remove the flag in the parser 2021-03-31 22:25:51 +08:00
Peng Jian
909d5ad2b5 Handle errors for Kafka engine 2021-03-31 17:15:57 +08:00
Alexey Milovidov
905793a7e4 Disable excessive squashing of blocks for StorageMemory #13052 2021-02-07 04:57:17 +03:00
kssenii
daab2c91bb Better 2021-01-21 21:15:11 +00:00
kssenii
c1702f34ee Add factories info into system.query_log 2021-01-21 15:46:37 +00:00
Nikita Mikhailov
b94a654715 build fix 2020-12-30 16:55:31 +03:00
Nikita Mikhailov
60b4a36c4a arcadia fix + live view fix + cleanup 2020-12-30 07:50:58 +03:00
Nikita Mikhailov
c5f92e5096 better formatfactory 2020-12-30 06:07:30 +03:00
Nikita Mikhailov
2dde73f700 better 2020-12-28 19:52:54 +03:00
Nikita Mikhailov
c3288c3fbf Merge branch 'master' of github.com:ClickHouse/ClickHouse into parallel-parsing-input-format 2020-12-28 15:09:37 +03:00
Nikita Mikhailov
dcfbe782c6 Merge branch 'master' of github.com:ClickHouse/ClickHouse into parallel-parsing-input-format 2020-12-23 05:20:22 +03:00
nikitamikhaylov
27f647f93d done 2020-12-23 01:01:05 +03:00
Alexey Milovidov
3ab2a167e8 Merge branch 'master' into arrays-as-nested-csv 2020-12-22 00:05:50 +03:00
Alexey Milovidov
558e9d270f Whitespace 2020-12-21 10:41:05 +03:00
Alexey Milovidov
ae9b67fefa Support to parse Arrays in CSV as nested CSV in a String 2020-12-20 13:26:08 +03:00
Nikita Mikhailov
fbf2ac35e8 fix tsan 2020-12-17 18:14:09 +03:00
nikitamikhaylov
12e624fd9a fix tests 2020-12-15 00:56:48 +03:00
nikitamikhaylov
f7ac8bf542 rebase and fix tests 2020-12-15 00:56:48 +03:00
nikitamikhaylov
3c683b1d86 better 2020-12-15 00:56:47 +03:00
nikitamikhaylov
48c76613bf better 2020-12-15 00:56:47 +03:00
nikitamikhaylov
1bdfc63ef3 delete PrepareAndEndUpReadBuffer 2020-12-15 00:56:47 +03:00
nikitamikhaylov
4ff1be6e25 better 2020-12-15 00:56:47 +03:00
nikitamikhaylov
a1010d708f disable PrettySpaceMonoBlock + writePrefix 2020-12-15 00:56:47 +03:00
nikitamikhaylov
8ff072c702 better 2020-12-15 00:56:47 +03:00
nikitamikhaylov
57705f5b73 delete and fix strange code 2020-12-15 00:56:47 +03:00
Nikita Mikhaylov
9922324787 it works 2020-12-15 00:56:47 +03:00
Nikita Mikhaylov
3bc1affd21 remove CSV restriction 2020-12-15 00:56:47 +03:00
Nikita Mikhaylov
9f127a46c7 first try 2020-12-15 00:56:47 +03:00
Nikita Mikhaylov
0a508c7b8a save 2020-12-15 00:56:46 +03:00
Nikita Mikhaylov
f40f3ced2a fix JSONEachRowArray 2020-12-15 00:56:46 +03:00
Nikita Mikhaylov
e0addac6fc save changes 2020-12-15 00:56:46 +03:00
Alexander Kuzmenkov
24293ccb30 Merge remote-tracking branch 'origin/master' into HEAD 2020-11-19 15:28:37 +03:00
Alexander Kuzmenkov
f2b3f5f8b6 Allow formatting named tuples as JSON objects 2020-11-18 13:38:30 +03:00
Alexander Kuzmenkov
8cde88440b Write rows as JSON array in JSONEachRow output format 2020-11-17 22:50:47 +03:00
Nikita Mikhaylov
33bada767c
Merge branch 'master' into parsing-constraints 2020-11-12 23:25:39 +03:00
tavplubix
67099f28ac
Merge pull request #16591 from ClickHouse/aku/create-file
Support `SETTINGS` clause for File engine
2020-11-09 14:15:42 +03:00
Alexander Kuzmenkov
3c60f6cec2 make a separate settings collection + some cleanup 2020-11-07 11:53:39 +03:00
nikitamikhaylov
2febfd43e5 rewrite format line as string 2020-11-06 21:55:13 +03:00
nikitamikhaylov
dabb23b668 done 2020-11-06 21:55:13 +03:00
Alexander Kuzmenkov
99ee127620 Support SETTINGS clause for File engine
Accept the usual user settings related to file formats.

Most of the diff are the mechanistic code changes required to allow
providing the required FormatSettings to the format factory. The File
engine then extracts these settings from the `CREATE` query, and specifies
them when creating the format parser.
2020-11-02 10:50:38 +03:00