Commit Graph

844 Commits

Author SHA1 Message Date
Alexey Milovidov
2bffed06de Fix style 2023-03-17 18:35:19 +01:00
Alexey Milovidov
1abe5ea58e Add data type fuzzer 2023-03-17 04:44:14 +01:00
Alexey Milovidov
6275c472a7 Better exceptions 2023-03-17 03:14:49 +01:00
avogar
2cc47b5bb6 Allow reading/writing nested arrays in Protobuf with only root field name as column name 2023-03-16 14:43:37 +00:00
Alexey Milovidov
bb6b775884 Merge branch 'master' into fuzzer-of-data-formats 2023-03-15 12:42:00 +01:00
Alexey Milovidov
e443c4e682
Merge pull request #47538 from Avogar/proper-parquet-fix
Proper fix for bug in parquet, revert reverted #45878
2023-03-14 22:29:39 +03:00
Michael Kolupaev
d3a514d221 Compress marks in memory 2023-03-13 16:29:00 -07:00
Alexey Milovidov
f331b9b398 Fix errors and add tests 2023-03-13 23:49:28 +01:00
Alexey Milovidov
14647525f8 Merge branch 'fix-bson-bug' of github.com:Avogar/ClickHouse into fuzzer-of-data-formats 2023-03-13 22:45:00 +01:00
avogar
4213ec609f Proper fix for bug in parquet, revert reverted #45878 2023-03-13 18:22:09 +00:00
Alexey Milovidov
1fd24c212b Update comment 2023-03-13 07:42:58 +01:00
Alexey Milovidov
02f7ef4723 Update comment 2023-03-13 05:28:06 +01:00
Alexey Milovidov
43b938d303 Update the fuzzer 2023-03-13 05:21:48 +01:00
Alexey Milovidov
f33b651686 Add fuzzer for data formats 2023-03-13 04:51:50 +01:00
avogar
5a18acde90 Revert #45878 and add a test 2023-03-11 21:15:14 +00:00
Kruglov Pavel
f387e6013a
Merge pull request #46990 from Avogar/native-types-conversions
Allow types conversion in Native input format
2023-03-10 16:55:16 +01:00
Alexey Milovidov
6f35d46ac8
Update SchemaInferenceUtils.cpp 2023-03-10 05:01:06 +03:00
avogar
46979e383f Fix big numbers inference in CSV 2023-03-09 18:21:47 +00:00
Kruglov Pavel
fe973f3d6f
Merge branch 'master' into native-types-conversions 2023-03-09 13:03:25 +01:00
Kruglov Pavel
71b6d6c6ae
Merge pull request #47114 from Avogar/parquet-compression
Improve working with compression methods in Parquet/ORC/Arrow formats
2023-03-09 13:02:18 +01:00
Mike Kot
9920a52c51 use std::lerp, constexpr hex.h 2023-03-07 22:50:17 +00:00
Kruglov Pavel
69a1309ade
Merge branch 'master' into native-types-conversions 2023-03-07 20:06:17 +01:00
Kruglov Pavel
479cd9b90b
Merge pull request #46972 from Avogar/json-date-int-inference
Fix date and int inference from string in JSON
2023-03-06 20:40:38 +01:00
Kruglov Pavel
3de905bb7c
Merge pull request #46616 from Avogar/fix-ipv4-ipv6-formats
Fix IPv4/IPv6 serialization/deserialization in binary formats
2023-03-06 19:40:29 +01:00
avogar
5ab5902f38 Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats 2023-03-01 21:27:46 +00:00
Kruglov Pavel
65f06fc9b1
Merge branch 'master' into json-date-int-inference 2023-02-28 14:31:57 +01:00
avogar
ab899bf2f3 Allow types conversion in Native input format 2023-02-27 19:28:19 +00:00
avogar
2e921e3d6b Fix date and int inference from string in JSON 2023-02-27 16:00:19 +00:00
Kruglov Pavel
443dedddca
Merge branch 'master' into use-parquet-2 2023-02-27 14:31:43 +01:00
Kruglov Pavel
47f9ca2166
Merge branch 'master' into fix-ipv4-ipv6-formats 2023-02-23 20:32:43 +01:00
avogar
eec6051a50 style 2023-02-23 16:16:08 +00:00
avogar
54622566df Add setting to change parquet version 2023-02-23 16:14:10 +00:00
Kruglov Pavel
ef0d6becba
Merge branch 'master' into null-as-default-all-formats 2023-02-21 16:52:39 +01:00
Kruglov Pavel
b0424c1021
Merge pull request #46171 from Avogar/insert-null-as-default
Use default of column type in `insert_null_as_default` if column DEFAULT values is not specified
2023-02-20 21:45:02 +01:00
Kruglov Pavel
9866ecfe8b
Merge branch 'master' into null-as-default-all-formats 2023-02-20 20:49:30 +01:00
avogar
8da3594cd8 Fix IPv4/IPv6 serialization/deserialization in binary formats 2023-02-20 17:42:56 +00:00
Alexey Milovidov
d8cda3dbb8 Remove PVS-Studio 2023-02-19 23:30:05 +01:00
Kruglov Pavel
9fd2226c4c
Update NativeReader.h 2023-02-15 15:13:04 +01:00
Geoff Genz
be8bf3a6a3
Merge branch 'master' into http_client_version 2023-02-13 08:43:59 -07:00
avogar
d1efd02480 Extend setting input_format_null_as_default for more formats 2023-02-10 16:41:09 +00:00
Geoff Genz
99c3ff53c5 Merge remote-tracking branch 'origin/master' into http_client_version
# Conflicts:
#	src/Interpreters/Context.cpp
#	src/Interpreters/Context.h
2023-02-10 04:35:53 -07:00
Geoff Genz
7ed8ed0284 Add support for client_protocol_version sent with HTTP 2023-02-10 03:47:06 -07:00
avogar
c3e8dd8984 Fix low cardinality case 2023-02-08 19:14:28 +00:00
Kruglov Pavel
4e2918cee3
Merge branch 'master' into parquet-fixed-binary 2023-02-08 12:31:13 +01:00
Antonio Andelic
a39e4e24c6
Merge branch 'master' into optimize_parquet_reader 2023-02-02 14:18:00 +01:00
Vladimir C
7c6281c446
Merge pull request #45581 from Avogar/fix-date-inference 2023-02-01 13:04:12 +01:00
liuneng
17fc22a21e add parquet max_block_size setting 2023-02-01 18:29:20 +08:00
Alexey Milovidov
04078dbed3 Remove trash 2023-01-29 22:43:36 +01:00
Kruglov Pavel
96700abbe1
Merge pull request #45678 from azat/formats/json-parse-tupels
Add ability to ignore unknown keys in JSON object for named tuples
2023-01-27 21:11:05 +01:00
Azat Khuzhin
1a8437f2c9 Add ability to ignore unknown keys in JSON object for named tuples
This can be useful in case your input JSON is complex, while you need
only few fields in it.

This behaviour is controlled by the
input_format_json_ignore_unknown_keys_in_named_tuple setting name, that
is turned OFF by default.

This will, almost, allow to parse gharchive dataset without jq. "almost"
because of two things:
- Tuple cannot be Nullable, so such keys with Tuple type in ClickHouse
  cannot be `null` in JSON
- You cannot use dot.dot notation to extract columns for file() engine,
  only tupleElement()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-27 10:01:08 +01:00
Alexander Tokmakov
d1baa7300c reformat ParsingException 2023-01-24 23:21:29 +01:00
Alexander Tokmakov
dd57215934 Merge branch 'master' into exception_message_patterns4 2023-01-24 17:03:12 +01:00
Kruglov Pavel
23c12ac8ee
Merge branch 'master' into parquet-fixed-binary 2023-01-24 16:51:05 +01:00
avogar
7eeb2a0bc7 Change comment 2023-01-24 15:46:32 +00:00
avogar
159f49266e Don't infer Dates from 8 digit numbers 2023-01-24 15:45:27 +00:00
Kruglov Pavel
cd1cd904a7
Merge branch 'master' into tsv-csv-detect-header 2023-01-23 23:49:56 +01:00
Alexander Tokmakov
3f6594f4c6 forbid old ctor of Exception 2023-01-23 22:18:05 +01:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
Kruglov Pavel
9820beae68
Apply suggestions from code review
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2023-01-19 16:11:13 +01:00
avogar
5bf4704e7a Support FixedSizeBinary type in Parquet/Arrow 2023-01-16 21:01:31 +00:00
Kruglov Pavel
e9d6590926
Merge branch 'master' into tsv-csv-detect-header 2023-01-16 17:50:24 +01:00
avogar
87b934c472 Insert default values in case of missing tuple elements in JSONEachRow 2023-01-12 16:36:44 +00:00
avogar
b461935374 Better 2023-01-12 13:11:04 +00:00
Kruglov Pavel
05a11ff4a4
Merge branch 'master' into tsv-csv-detect-header 2023-01-12 12:35:18 +01:00
avogar
26cd56d113 Fix tests, make better 2023-01-11 22:52:15 +00:00
Kruglov Pavel
50eb9fca67
Merge pull request #44696 from Avogar/schema-inference-uint
Infer UInt64 in case of Int64 overflow
2023-01-11 14:24:42 +01:00
Alexey Milovidov
0d39d26a34 Don't fix parallel formatting 2023-01-09 06:15:20 +01:00
Anton Popov
1f32ffedf8
Merge pull request #43221 from ClickHouse/refactoring-ip-types
Replace domain IP types (IPv4, IPv6) with native
2023-01-07 12:01:21 +01:00
avogar
ee72799121 Fix tests, make better 2023-01-06 20:46:43 +00:00
Anton Popov
b25f875674
Merge pull request #44875 from ClickHouse/fix-another-one-cannot-read-all-data-for-lc-dict-error
Fix right offset for reading LowCardinality dictionary from remote fs
2023-01-06 15:24:36 +01:00
avogar
7fcdb08ec6 Detect header in CSV/TSV/CustomSeparated files automatically 2023-01-05 22:57:25 +00:00
Yakov Olkhovskiy
7a5a36cbed
Merge branch 'master' into refactoring-ip-types 2023-01-04 11:11:06 -05:00
avogar
1f3d75cbf2 Better 2023-01-04 14:58:17 +00:00
Kruglov Pavel
7062054d60
Merge branch 'master' into schema-inference-uint 2023-01-04 14:50:01 +01:00
Nikolai Kochetov
da26f62a9b Fix right offset for reading LowCardinality dictionary from remote fs in case if right mark was in the middle of compressed block. 2023-01-03 18:19:51 +00:00
Alexey Milovidov
e855d3519a
Merge branch 'master' into refactoring-ip-types 2023-01-02 21:58:53 +03:00
avogar
73fecae5ff Fix comments 2023-01-02 15:31:07 +00:00
Kruglov Pavel
0a43976977
Merge branch 'master' into validate-types 2023-01-02 16:10:14 +01:00
Kruglov Pavel
69b9842bc6
Merge branch 'master' into schema-inference-uint 2022-12-30 18:16:00 +01:00
Kruglov Pavel
4982d132fb
Merge branch 'master' into validate-types 2022-12-30 17:52:13 +01:00
Kruglov Pavel
894726bd8f
Merge branch 'master' into improve-streaming-engines 2022-12-29 22:59:45 +01:00
Kruglov Pavel
150a699dda
Merge pull request #44546 from Avogar/better-object-as-string-inference
Improve json object as string inference
2022-12-29 21:58:46 +01:00
avogar
1ce69371fb Infer UInt64 in case of Int64 overflow 2022-12-28 21:46:08 +00:00
Raúl Marín
5de11979ce
Unify query elapsed time measurements (#43455)
* Unify query elapsed time reporting

* add-test: Make shell tests executable

* Add some tests around query elapsed time

* Style and ubsan
2022-12-28 21:01:41 +01:00
avogar
411f98306a Merge branch 'master' of github.com:ClickHouse/ClickHouse into validate-types 2022-12-27 19:24:15 +00:00
Kruglov Pavel
819e7a3008
Merge pull request #44550 from Avogar/better-json-tuples-to-arrays-inference
Improve inferring arrays with nulls in JSON formats
2022-12-27 18:22:13 +01:00
Kruglov Pavel
ac162a2c49
Merge pull request #44522 from Avogar/zero-numbers
Infer numbers starting from zero as strings in TSV
2022-12-27 17:00:10 +01:00
avogar
798c3111ed Improve inferring arrays with nulls in JSON formats 2022-12-24 00:21:48 +00:00
avogar
331f4bfee1 Fix 2022-12-23 19:58:50 +00:00
avogar
f15bf1839a Add missed settings into additional cache info 2022-12-23 19:52:54 +00:00
avogar
8dfe90a6c1 Improve json object as string inference 2022-12-23 19:44:13 +00:00
avogar
123392c996 Fix tests 2022-12-23 14:42:38 +00:00
Vladimir C
7482ea54ab
Merge pull request #43972 from ClickHouse/vdimir/tmp-data-in-fs-cache-2 2022-12-23 11:59:27 +01:00
avogar
f555048ae5 Infer numbers starting from zero as strings in TSV 2022-12-22 21:55:39 +00:00
Dmitry Novik
cff882d506 Merge remote-tracking branch 'origin/master' into refector-function-node 2022-12-22 21:34:29 +00:00
Kruglov Pavel
6a017a6586
Merge pull request #43379 from Avogar/better-capn-proto
Add small improvements in CapnProto format
2022-12-22 14:50:10 +01:00
vdimir
d30d25dbbe
Temporary files evict fs cache 2022-12-22 10:22:49 +00:00
Yakov Olkhovskiy
a8cb29da4b
Merge branch 'master' into refactoring-ip-types 2022-12-21 23:56:24 -05:00
avogar
4ab3e90382 Validate types in table function arguments/CAST function arguments/JSONAsObject schema inference 2022-12-21 21:21:30 +00:00
Kruglov Pavel
5e01a3d74e
Merge branch 'master' into improve-streaming-engines 2022-12-21 10:51:50 +01:00