taiyang-li
a02c49e16f
allow skip null values when serailize tuple to json objects
2023-10-24 11:47:46 +08:00
Raúl Marín
e500dc22e4
Respect default format when using http_write_exception_in_output_format
2023-10-17 14:14:58 +02:00
Michael Kolupaev
ce7eca0615
DWARF input format ( #55450 )
...
* Add ReadBufferFromFileBase::isRegularLocalFile()
* DWARF input format
* Review comments
* Changed things around ENABLE_EMBEDDED_COMPILER build setting
* Added 'ranges' column
* no-msan no-ubsan
2023-10-16 17:00:07 -07:00
Azat Khuzhin
2cbb069b68
Add ability to ignore data after semicolon in Values format
...
This is required for client, to handle comments in multiquery mode.
v0: separate context for input format
v2: cannot use separate context since params and stuff are changed in global context
v3: do not sent this setting to the server (breaks queries for readonly profiles)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-10-12 14:55:26 +02:00
Kruglov Pavel
bea80ab5b7
Merge branch 'master' into json-object-as-tuple-inference
2023-09-26 15:23:08 +02:00
avogar
42ca897f2d
Better schema inference for JSON formats
2023-09-25 15:42:59 +00:00
Kruglov Pavel
2c407ab3c0
Merge branch 'master' into json-object-as-tuple-inference
2023-09-15 16:29:48 +02:00
Kruglov Pavel
dbd24b240c
Merge branch 'master' into http-valid-json-on-exception
2023-09-15 14:55:31 +02:00
avogar
1480c8ad30
Place setting into separate struct
2023-09-13 13:19:05 +00:00
avogar
2d8f33bfa2
Fix parsing error in WithNames formats while reading subset of columns with disabled input_format_with_names_use_header
2023-09-11 14:55:37 +00:00
avogar
ba307c7466
Allow to infer named Tuples from JSON objects under a setting in JSON formats
2023-09-07 19:41:19 +00:00
irenjj
51aa89eed8
Add a setting to automatically escape special characters in Markdown.
2023-08-28 00:10:33 +08:00
Michael Kolupaev
2f4d433e69
Parquet filter pushdown
2023-08-21 14:15:52 -07:00
Michael Kolupaev
6009e1b293
Merge pull request #53324 from bigo-sg/ch_gluten_2583
...
Implement native orc input format without arrow to improve performance
2023-08-21 13:44:57 -07:00
Kruglov Pavel
c68456a20a
Merge pull request #52692 from Avogar/variable-number-of-volumns-more-formats
...
Allow variable number of columns in more formats, make it work with schema inference
2023-08-21 13:28:35 +02:00
taiyang-li
f723e8d43a
change as request
2023-08-21 12:09:02 +08:00
avogar
bca91548ad
Add setting input_format_parquet_local_file_min_bytes_for_seek
2023-08-17 12:28:01 +00:00
avogar
7e863a2726
Address comments
2023-08-11 13:17:49 +00:00
avogar
3ad7e57059
Optimize reading small row groups by batching them together in Parquet
2023-08-11 13:17:45 +00:00
Kruglov Pavel
6600f87f86
Merge branch 'master' into http-valid-json-on-exception
2023-08-10 13:53:32 +02:00
Kruglov Pavel
33a39900ad
Merge branch 'master' into variable-number-of-volumns-more-formats
2023-08-09 19:51:17 +02:00
Anton Popov
ff137773e7
Merge branch 'master' into formats-with-subcolumns
2023-08-02 15:24:56 +02:00
avogar
fa905ebd27
Clean up
2023-08-01 10:14:09 +00:00
avogar
a71cd56a90
Output valid JSON/XML on excetpion during HTTP query execution
2023-08-01 10:06:56 +00:00
Kruglov Pavel
3e1c409e60
Merge branch 'master' into structure-to-schema
2023-07-28 11:32:16 +02:00
avogar
6d77d52dfe
Allow variable number of columns in TSV/CuatomSeprarated/JSONCompactEachRow, make schema inference work with variable number of columns
2023-07-27 18:02:29 +00:00
Kruglov Pavel
0d34e97dbe
Merge branch 'master' into formats-with-subcolumns
2023-07-26 13:30:35 +02:00
Michael Kolupaev
8184a289e5
Partially reimplement Parquet encoder to make it faster and parallelizable
2023-07-25 10:16:28 +00:00
Kruglov Pavel
f0026af189
Revert "Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed""
2023-07-19 14:51:11 +02:00
Kruglov Pavel
7b3564f96a
Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed"
2023-07-19 14:44:59 +02:00
robot-ch-test-poll4
63d0616a22
Merge pull request #51716 from KevinyhZou/bug_fix_csv_field_type_not_match
...
Improve CSVInputFormat to check and set default value to column if deserialize failed
2023-07-19 14:41:05 +02:00
kevinyhzou
95424177d5
review fix
2023-07-19 18:26:54 +08:00
avogar
67f340b501
Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema
2023-07-18 13:52:15 +00:00
Kruglov Pavel
1dd05319b5
Merge branch 'master' into formats-with-subcolumns
2023-07-17 19:13:42 +02:00
kevinyhzou
355faa4251
ci fix
2023-07-17 20:08:32 +08:00
robot-clickhouse-ci-2
ac3cc1c2ff
Merge pull request #45671 from ClibMouse/feature/interval-kql-style-formatting
...
Implement KQL-style formatting for Interval
2023-07-16 04:06:54 +02:00
kevinyhzou
b2665031dc
review fix
2023-07-13 20:27:14 +08:00
kevinyhzou
ba57c84db3
bug fix csv input field type mismatch
2023-07-13 20:24:10 +08:00
ltrk2
2d2debe3ce
Introduce a separate setting for interval output formatting
2023-07-10 13:51:49 -04:00
ltrk2
b673aa8e6b
Use the dialect configuration
2023-07-10 13:51:49 -04:00
ltrk2
522b9ebf8c
Implement KQL-style formatting for Interval
2023-07-10 13:51:49 -04:00
Dmitry Kardymon
32f5a78302
Fix setting name
2023-07-06 07:32:46 +00:00
Dmitry Kardymon
24b5c9c204
Use one setting input_format_csv_allow_variable_number_of_colums and code in RowInput
2023-07-06 06:05:43 +00:00
avogar
98aa6b317f
Support reading subcolumns from file/s3/hdfs/url/azureBlobStorage table functions
2023-07-04 21:17:26 +00:00
avogar
03f820bc4a
Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema
2023-06-22 18:46:01 +00:00
avogar
4060beae49
Structure to CapnProto/Protobuf schema take 1
2023-06-22 18:00:00 +00:00
Dmitry Kardymon
30bea857fd
Merge remote-tracking branch 'origin/master' into ADQM-870
2023-06-19 07:19:07 +00:00
Dmitry Kardymon
806176d88e
Add input_format_csv_missing_as_default setting and tests
2023-06-15 11:23:08 +00:00
KevinyhZou
953f40aa3b
Merge branch 'master' into bug_fix_csv_parse_by_tab_delimiter
2023-06-15 10:25:19 +08:00
Dmitry Kardymon
a91fc3ddb3
Add docs/ add more cases in test
2023-06-14 16:44:31 +00:00
Dmitry Kardymon
ed318d1035
Add input_format_csv_ignore_extra_columns setting (prototype)
2023-06-14 10:35:36 +00:00
kevinyhzou
f3b99156ac
review fix
2023-06-14 10:48:21 +08:00
Kruglov Pavel
607f337d67
Merge pull request #50592 from Avogar/max-bytes-to-read-in-schema-inference
...
Add setting to limit the number of bytes to read in schema inference
2023-06-13 16:47:57 +02:00
kevinyhzou
911f8ad8dc
use whitespace or tab as field delimiter
2023-06-12 11:57:52 +08:00
kevinyhzou
48e1b21aab
Add feature to support read csv by space & tab delimiter
2023-06-08 20:34:30 +08:00
Kruglov Pavel
1baa6404e6
Merge branch 'master' into skip-trailing-empty-lines
2023-06-06 19:39:34 +02:00
avogar
df50833b70
Allow to skip trailing empty lines in CSV/TSV/CustomeSeparated formats
2023-06-06 17:33:05 +00:00
Kruglov Pavel
af880a6f3b
Merge branch 'master' into max-bytes-to-read-in-schema-inference
2023-06-06 14:47:58 +02:00
Nikita Mikhaylov
e87348010d
Rework loading and removing of data parts for MergeTree tables. ( #49474 )
...
Co-authored-by: Sergei Trifonov <sergei@clickhouse.com>
2023-06-06 14:42:56 +02:00
avogar
33e51d4f3b
Add setting to limit the number of bytes to read in schema inference
2023-06-05 15:22:04 +00:00
Alexey Gerasimchuk
9958731c27
Merge branch 'master' into ADQM-830
2023-06-05 07:46:47 +10:00
Michael Kolupaev
b51064a508
Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead
2023-06-01 18:48:30 -07:00
Alexey Gerasimchuck
75791d7a63
Added input_format_csv_trim_whitespaces parameter
2023-05-25 07:51:32 +00:00
Michael Kolupaev
6fd5d8e8ba
Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files
2023-05-19 18:39:50 +00:00
Alexey Milovidov
f6144ee32b
Revert "Make Pretty
formats even prettier."
2023-05-13 02:45:07 +03:00
Alexey Milovidov
ef16077c72
Merge branch 'master' into pretty-time-squashing
2023-05-06 18:20:49 +03:00
Alexey Milovidov
90b0de5677
Make Pretty prettier
2023-05-05 06:36:53 +02:00
Michael Kolupaev
3bd1489f18
Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading()
2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0
Better control over Parquet row group size
2023-05-04 14:59:55 -07:00
Nikita Mikhaylov
954e3b724c
Speedup outdated parts loading ( #49317 )
2023-05-03 18:56:45 +02:00
Michael Kolupaev
87be78e6de
Better
2023-04-17 04:58:32 +00:00
Michael Kolupaev
e133633359
Parallel decoding with one row group per thread
2023-04-17 04:58:32 +00:00
Michael Kolupaev
683077890f
Highly questionable refactoring (getInputMultistream() nonsense)
2023-04-17 04:58:32 +00:00
Michael Kolupaev
2d4fe85513
Something
2023-04-17 04:58:32 +00:00
Alexey Milovidov
bb6b775884
Merge branch 'master' into fuzzer-of-data-formats
2023-03-15 12:42:00 +01:00
Alexey Milovidov
f331b9b398
Fix errors and add tests
2023-03-13 23:49:28 +01:00
Alexey Milovidov
14647525f8
Merge branch 'fix-bson-bug' of github.com:Avogar/ClickHouse into fuzzer-of-data-formats
2023-03-13 22:45:00 +01:00
avogar
4213ec609f
Proper fix for bug in parquet, revert reverted #45878
2023-03-13 18:22:09 +00:00
Alexey Milovidov
f33b651686
Add fuzzer for data formats
2023-03-13 04:51:50 +01:00
avogar
5a18acde90
Revert #45878 and add a test
2023-03-11 21:15:14 +00:00
Kruglov Pavel
fe973f3d6f
Merge branch 'master' into native-types-conversions
2023-03-09 13:03:25 +01:00
Kruglov Pavel
69a1309ade
Merge branch 'master' into native-types-conversions
2023-03-07 20:06:17 +01:00
avogar
5ab5902f38
Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats
2023-03-01 21:27:46 +00:00
avogar
ab899bf2f3
Allow types conversion in Native input format
2023-02-27 19:28:19 +00:00
Kruglov Pavel
443dedddca
Merge branch 'master' into use-parquet-2
2023-02-27 14:31:43 +01:00
avogar
54622566df
Add setting to change parquet version
2023-02-23 16:14:10 +00:00
Kruglov Pavel
9866ecfe8b
Merge branch 'master' into null-as-default-all-formats
2023-02-20 20:49:30 +01:00
Geoff Genz
be8bf3a6a3
Merge branch 'master' into http_client_version
2023-02-13 08:43:59 -07:00
avogar
d1efd02480
Extend setting input_format_null_as_default for more formats
2023-02-10 16:41:09 +00:00
Geoff Genz
99c3ff53c5
Merge remote-tracking branch 'origin/master' into http_client_version
...
# Conflicts:
# src/Interpreters/Context.cpp
# src/Interpreters/Context.h
2023-02-10 04:35:53 -07:00
Geoff Genz
7ed8ed0284
Add support for client_protocol_version sent with HTTP
2023-02-10 03:47:06 -07:00
Kruglov Pavel
4e2918cee3
Merge branch 'master' into parquet-fixed-binary
2023-02-08 12:31:13 +01:00
liuneng
17fc22a21e
add parquet max_block_size setting
2023-02-01 18:29:20 +08:00
Alexey Milovidov
04078dbed3
Remove trash
2023-01-29 22:43:36 +01:00
Azat Khuzhin
1a8437f2c9
Add ability to ignore unknown keys in JSON object for named tuples
...
This can be useful in case your input JSON is complex, while you need
only few fields in it.
This behaviour is controlled by the
input_format_json_ignore_unknown_keys_in_named_tuple setting name, that
is turned OFF by default.
This will, almost, allow to parse gharchive dataset without jq. "almost"
because of two things:
- Tuple cannot be Nullable, so such keys with Tuple type in ClickHouse
cannot be `null` in JSON
- You cannot use dot.dot notation to extract columns for file() engine,
only tupleElement()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-27 10:01:08 +01:00
Kruglov Pavel
23c12ac8ee
Merge branch 'master' into parquet-fixed-binary
2023-01-24 16:51:05 +01:00
Kruglov Pavel
cd1cd904a7
Merge branch 'master' into tsv-csv-detect-header
2023-01-23 23:49:56 +01:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages ( #45449 )
...
* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix
2023-01-24 00:13:58 +03:00
avogar
5bf4704e7a
Support FixedSizeBinary type in Parquet/Arrow
2023-01-16 21:01:31 +00:00
Kruglov Pavel
e9d6590926
Merge branch 'master' into tsv-csv-detect-header
2023-01-16 17:50:24 +01:00