Commit Graph

304 Commits

Author SHA1 Message Date
kevinyhzou
911f8ad8dc use whitespace or tab as field delimiter 2023-06-12 11:57:52 +08:00
kevinyhzou
48e1b21aab Add feature to support read csv by space & tab delimiter 2023-06-08 20:34:30 +08:00
Nikita Mikhaylov
e87348010d
Rework loading and removing of data parts for MergeTree tables. (#49474)
Co-authored-by: Sergei Trifonov <sergei@clickhouse.com>
2023-06-06 14:42:56 +02:00
Alexey Gerasimchuk
9958731c27
Merge branch 'master' into ADQM-830 2023-06-05 07:46:47 +10:00
Michael Kolupaev
b51064a508 Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead 2023-06-01 18:48:30 -07:00
Alexey Gerasimchuck
75791d7a63 Added input_format_csv_trim_whitespaces parameter 2023-05-25 07:51:32 +00:00
Michael Kolupaev
6fd5d8e8ba Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files 2023-05-19 18:39:50 +00:00
Alexey Milovidov
f6144ee32b
Revert "Make Pretty formats even prettier." 2023-05-13 02:45:07 +03:00
Alexey Milovidov
ef16077c72
Merge branch 'master' into pretty-time-squashing 2023-05-06 18:20:49 +03:00
Alexey Milovidov
90b0de5677 Make Pretty prettier 2023-05-05 06:36:53 +02:00
Michael Kolupaev
3bd1489f18 Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading() 2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0 Better control over Parquet row group size 2023-05-04 14:59:55 -07:00
Nikita Mikhaylov
954e3b724c
Speedup outdated parts loading (#49317) 2023-05-03 18:56:45 +02:00
Michael Kolupaev
87be78e6de Better 2023-04-17 04:58:32 +00:00
Michael Kolupaev
e133633359 Parallel decoding with one row group per thread 2023-04-17 04:58:32 +00:00
Michael Kolupaev
683077890f Highly questionable refactoring (getInputMultistream() nonsense) 2023-04-17 04:58:32 +00:00
Michael Kolupaev
2d4fe85513 Something 2023-04-17 04:58:32 +00:00
Alexey Milovidov
bb6b775884 Merge branch 'master' into fuzzer-of-data-formats 2023-03-15 12:42:00 +01:00
Alexey Milovidov
f331b9b398 Fix errors and add tests 2023-03-13 23:49:28 +01:00
Alexey Milovidov
14647525f8 Merge branch 'fix-bson-bug' of github.com:Avogar/ClickHouse into fuzzer-of-data-formats 2023-03-13 22:45:00 +01:00
avogar
4213ec609f Proper fix for bug in parquet, revert reverted #45878 2023-03-13 18:22:09 +00:00
Alexey Milovidov
f33b651686 Add fuzzer for data formats 2023-03-13 04:51:50 +01:00
avogar
5a18acde90 Revert #45878 and add a test 2023-03-11 21:15:14 +00:00
Kruglov Pavel
fe973f3d6f
Merge branch 'master' into native-types-conversions 2023-03-09 13:03:25 +01:00
Kruglov Pavel
69a1309ade
Merge branch 'master' into native-types-conversions 2023-03-07 20:06:17 +01:00
avogar
5ab5902f38 Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats 2023-03-01 21:27:46 +00:00
avogar
ab899bf2f3 Allow types conversion in Native input format 2023-02-27 19:28:19 +00:00
Kruglov Pavel
443dedddca
Merge branch 'master' into use-parquet-2 2023-02-27 14:31:43 +01:00
avogar
54622566df Add setting to change parquet version 2023-02-23 16:14:10 +00:00
Kruglov Pavel
9866ecfe8b
Merge branch 'master' into null-as-default-all-formats 2023-02-20 20:49:30 +01:00
Geoff Genz
be8bf3a6a3
Merge branch 'master' into http_client_version 2023-02-13 08:43:59 -07:00
avogar
d1efd02480 Extend setting input_format_null_as_default for more formats 2023-02-10 16:41:09 +00:00
Geoff Genz
99c3ff53c5 Merge remote-tracking branch 'origin/master' into http_client_version
# Conflicts:
#	src/Interpreters/Context.cpp
#	src/Interpreters/Context.h
2023-02-10 04:35:53 -07:00
Geoff Genz
7ed8ed0284 Add support for client_protocol_version sent with HTTP 2023-02-10 03:47:06 -07:00
Kruglov Pavel
4e2918cee3
Merge branch 'master' into parquet-fixed-binary 2023-02-08 12:31:13 +01:00
liuneng
17fc22a21e add parquet max_block_size setting 2023-02-01 18:29:20 +08:00
Alexey Milovidov
04078dbed3 Remove trash 2023-01-29 22:43:36 +01:00
Azat Khuzhin
1a8437f2c9 Add ability to ignore unknown keys in JSON object for named tuples
This can be useful in case your input JSON is complex, while you need
only few fields in it.

This behaviour is controlled by the
input_format_json_ignore_unknown_keys_in_named_tuple setting name, that
is turned OFF by default.

This will, almost, allow to parse gharchive dataset without jq. "almost"
because of two things:
- Tuple cannot be Nullable, so such keys with Tuple type in ClickHouse
  cannot be `null` in JSON
- You cannot use dot.dot notation to extract columns for file() engine,
  only tupleElement()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-27 10:01:08 +01:00
Kruglov Pavel
23c12ac8ee
Merge branch 'master' into parquet-fixed-binary 2023-01-24 16:51:05 +01:00
Kruglov Pavel
cd1cd904a7
Merge branch 'master' into tsv-csv-detect-header 2023-01-23 23:49:56 +01:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
avogar
5bf4704e7a Support FixedSizeBinary type in Parquet/Arrow 2023-01-16 21:01:31 +00:00
Kruglov Pavel
e9d6590926
Merge branch 'master' into tsv-csv-detect-header 2023-01-16 17:50:24 +01:00
avogar
87b934c472 Insert default values in case of missing tuple elements in JSONEachRow 2023-01-12 16:36:44 +00:00
Kruglov Pavel
05a11ff4a4
Merge branch 'master' into tsv-csv-detect-header 2023-01-12 12:35:18 +01:00
Alexey Milovidov
0d39d26a34 Don't fix parallel formatting 2023-01-09 06:15:20 +01:00
avogar
7fcdb08ec6 Detect header in CSV/TSV/CustomSeparated files automatically 2023-01-05 22:57:25 +00:00
Kruglov Pavel
0a43976977
Merge branch 'master' into validate-types 2023-01-02 16:10:14 +01:00
Kruglov Pavel
4982d132fb
Merge branch 'master' into validate-types 2022-12-30 17:52:13 +01:00
Kruglov Pavel
894726bd8f
Merge branch 'master' into improve-streaming-engines 2022-12-29 22:59:45 +01:00