Commit Graph

181 Commits

Author SHA1 Message Date
taiyang-li
2ef316801c Merge branch 'master' into use_minmax_index 2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference 2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e suppoort skip splits in orc and parquet 2022-04-06 16:40:22 +08:00
Maksim Kita
371cdc956a Added input format settings for parsing invalid IPv4, IPv6 addresses as default values 2022-03-30 12:54:19 +02:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference 2022-03-25 12:05:40 +01:00
avogar
abc020a502 Clean up 2022-03-24 13:08:58 +00:00
avogar
557edbd172 Add some improvements and fixes in schema inference 2022-03-24 12:54:12 +00:00
Antonio Andelic
0c23cd7b94 Add support for case insensitive column matching in arrow 2022-03-22 10:55:10 +00:00
Antonio Andelic
f75b054255 Allow case insensitive column matching 2022-03-21 07:47:37 +00:00
Antonio Andelic
607f785e48 Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
This reverts commit ebf72bf61d, reversing
changes made to f1b812bdc1.
2022-03-17 12:31:43 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
shuchaome
46cb4483a6 Optimise by lowering schema on the beginning. Add a functional test. 2022-03-11 14:34:46 +08:00
shuchaome
56795b831d add setting to lower column case when reading parquet/orc file 2022-03-09 16:07:02 +08:00
cwkyaoyao
72194bbaf3 Add date_time_input_format = best_effort_us 2022-03-02 16:00:06 +08:00
avogar
77b42bb9ff Support UUID in MsgPack format 2022-02-07 17:11:44 +03:00
taiyang-li
1e102bc1b2 merge master 2022-01-01 09:01:06 +08:00
avogar
8112a71233 Implement schema inference for most input formats 2021-12-29 12:18:56 +03:00
taiyang-li
9036b18c2f merge master 2021-12-27 15:12:48 +08:00
taiyang-li
2597925724 merge master 2021-12-21 15:55:39 +08:00
avogar
ba6a513db0 Fix tuple output in CSV format 2021-12-20 19:27:09 +03:00
kreuzerkrieg
f06c37d206 Stop reading incomplete stripes and skip rows. 2021-12-19 18:41:32 +02:00
taiyang-li
ca3f7425a4 fix code 2021-12-14 17:37:31 +08:00
李扬
8675086104
Merge branch 'master' into hive_table 2021-12-12 09:01:46 -06:00
taiyang-li
9ec8272186 refactor hive text input format 2021-12-02 16:14:25 +08:00
mergify[bot]
8d5460b469
Merge branch 'master' into feature-support-bool-type 2021-11-29 11:50:18 +00:00
taiyang-li
72f60cceb9
Merge branch 'master' into hive_table 2021-11-25 17:33:26 +08:00
Kseniia Sumarokova
93cf66df12
Merge pull request #30936 from kssenii/seekable-read-buffers
Reduce memory usage for some formats when reading with s3/url/hdfs
2021-11-25 11:19:24 +03:00
lgbo
996d7125c0
Merge branch 'master' into hive_table 2021-11-23 10:19:02 +08:00
MaxWk
f17d5b02e4 use bool representation 2021-11-19 14:30:22 +08:00
avogar
73d1918410 tmp 2021-11-16 17:10:30 +03:00
MaxWk
d42a454837 support some bool format 2021-11-11 16:01:32 +08:00
taiyang-li
deef4d4dbe add options read_bool_as_uint8 when parse csv 2021-11-11 11:49:54 +08:00
taiyang-li
36ca0b296b implement hive table engine 2021-11-05 19:55:30 +08:00
kssenii
ec11179f91 Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers 2021-11-03 14:33:31 +03:00
kssenii
45ea820297 Reduce memory usage for some formats 2021-11-03 14:30:03 +03:00
Kruglov Pavel
327a34e9da
Merge pull request #30497 from Avogar/null-deserialization
Add custom null representation support for TSV/CSV input formats, fix Nullable(String) deserializing in some formats
2021-11-03 11:30:25 +03:00
avogar
42ab57f0e5 Set output_format_avro_rows_in_file default to 1 2021-11-02 14:06:10 +03:00
Kruglov Pavel
901ebcede6
Merge pull request #30351 from arenadata/ADQM-335
output_format_avro_rows_in_file
2021-11-02 12:25:27 +03:00
Kruglov Pavel
1f8535c02b
Merge branch 'master' into null-deserialization 2021-11-02 12:15:21 +03:00
avogar
d1ef96a5ef Add test, avoid unnecessary allocations, use PeekableReadBuffer only in corner case 2021-10-27 17:29:15 +03:00
Ilya Golshtein
551a1065c1 output_format_avro_rows_in_file default is 1000000 2021-10-21 14:19:25 +03:00
Ilya Golshtein
82f33151e7 output_format_avro_rows_in_file fixes per code review 2021-10-21 02:53:39 +03:00
Kruglov Pavel
5052ec3ab0
Merge branch 'master' into tsv-csv 2021-10-19 12:03:52 +03:00
Ilya Golshtein
d90302aa3b output_format_avro_rows_in_file 2021-10-18 19:01:06 +03:00
Kruglov Pavel
1cd938fbba Fix typo 2021-10-14 16:43:22 +03:00
avogar
ce22f534c4 Add CapnProto output format, refactor CapnProto input format 2021-10-14 16:43:22 +03:00
avogar
324dfd4f81 Refactor and improve TSV, CSV and JSONCompactEachRow formats, fix some bugs in formats 2021-10-14 13:32:49 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
PHO
3c4b1ea9c5 New setting: output_format_csv_null_representation
This is the same as output_format_tsv_null_representation but is for CSV output.
2021-09-17 17:58:23 +09:00
Alexey Milovidov
8adaef7c8e Make text format for Decimal tuneable 2021-08-16 11:03:23 +03:00
Pavel Kruglov
e4c5d7e3b1 Support inserting nested as Array of structs, add some refactoring 2021-08-05 14:10:27 +03:00
Vitaly Baranov
4f1926550b
Merge pull request #26429 from vitlibar/remove-mysql-wire-context
Remove MySQLWireContext
2021-07-19 12:21:24 +03:00
Vitaly Baranov
0f8b196682 Remove MySQLWireContext. 2021-07-16 22:21:20 +03:00
Ilya Golshtein
16532658c2 Avro string for ClickHouse string 2021-07-13 20:03:00 +03:00
Pavel Kruglov
c8b37977da Fix bugs, support dictionary for Arrow format 2021-06-15 16:15:27 +03:00
Maksim Kita
5ba6c7b731 FormatSettings null_as_default default value fix 2021-04-03 00:05:40 +03:00
Vitaly Baranov
18e036d19b Improved serialization for data types combined of Arrays and Tuples.
Improved matching enum data types to protobuf enum type.
Fixed serialization of the Map data type.
Omitted values are now set by default.
2021-02-17 20:50:09 +03:00
Alexey Milovidov
ae9b67fefa Support to parse Arrays in CSV as nested CSV in a String 2020-12-20 13:26:08 +03:00
Alexander Kuzmenkov
24293ccb30 Merge remote-tracking branch 'origin/master' into HEAD 2020-11-19 15:28:37 +03:00
Alexander Kuzmenkov
f2b3f5f8b6 Allow formatting named tuples as JSON objects 2020-11-18 13:38:30 +03:00
Alexander Kuzmenkov
8cde88440b Write rows as JSON array in JSONEachRow output format 2020-11-17 22:50:47 +03:00
Alexander Kuzmenkov
3c60f6cec2 make a separate settings collection + some cleanup 2020-11-07 11:53:39 +03:00
Alexander Kuzmenkov
99ee127620 Support SETTINGS clause for File engine
Accept the usual user settings related to file formats.

Most of the diff are the mechanistic code changes required to allow
providing the required FormatSettings to the format factory. The File
engine then extracts these settings from the `CREATE` query, and specifies
them when creating the format parser.
2020-11-02 10:50:38 +03:00
vivarum
1d9df13b47
Merge branch 'master' into enable-parsing-of-input-enum-values-by-id-10682 2020-10-16 11:07:01 +03:00
Maksim Kita
adaae8a12c Added OutputFormat setting date_time_output_format 2020-10-13 13:59:43 +03:00
Vasily Kozhukhovskiy
dfc13ca8e7 Enable parsing enum values by their ids for CSV, TSV and JSON input formats
* for CSV and TSV input formats input_format_csv_enum_as_number,
    input_format_tsv_enum_as_number settings should be enabled in order to
    treat input value as enum id
2020-10-06 18:37:54 +03:00
feng lv
4f000388a7 add setting output_format_pretty_row_numbers 2020-09-29 20:30:36 +08:00
Artem Zuikov
51ba12c2c3
Try speedup build (#14809) 2020-09-15 12:55:57 +03:00
Kruglov Pavel
0e9612d9ff
Add null_representation setting in TSV 2020-09-08 15:29:22 +03:00
Maxim Sabyanin
40f7ec71d3 add setting output_format_pretty_grid_charset
This setting allows to chose charset for printing grids (either UTF-8 or
ASCII).
2020-07-10 22:25:49 +03:00
Andrew Onyshchuk
9054862dde Avro: allow missing fields 2020-06-27 21:23:21 -05:00
Alexey Milovidov
2895cfb480 Limit value width in Pretty formats 2020-05-31 22:22:59 +03:00
FawnD2
02e12215e7 Apply reducing memory usage optimization for seekable files to ORC format 2020-05-04 03:52:28 +03:00
FawnD2
112758b99d Merge branch 'master' into arrow-io-format 2020-05-04 00:53:17 +03:00
FawnD2
7cc7a87f9f Simplify interfaces 2020-05-03 21:12:14 +03:00
FawnD2
a590826fbb Format settings for Arrow 2020-05-03 15:26:39 +03:00
Nikolai Kochetov
2f06180c5e Revert changes for CSVRowOutputFormat. 2020-04-27 18:21:53 +03:00
Vitaliy Zakaznikov
369b4d53ef Adding support for output_format_enable_streaming format setting. 2020-04-26 15:44:11 +02:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00