Commit Graph

888 Commits

Author SHA1 Message Date
Kruglov Pavel
5ada385502
Merge branch 'master' into allow_empty 2023-05-16 12:21:31 +02:00
Kruglov Pavel
558eda4146
Merge pull request #49412 from azat/block-use-dense-hash-map
Switch Block::NameMap to google::dense_hash_map over HashMap
2023-05-15 12:22:55 +02:00
Alexey Milovidov
0ca36d4f89 Merge branch 'master' into clang-17 2023-05-14 01:57:40 +02:00
Alexey Milovidov
5a44dc26e7 Fixes for clang-17 2023-05-13 02:57:31 +02:00
Alexey Milovidov
f6144ee32b
Revert "Make Pretty formats even prettier." 2023-05-13 02:45:07 +03:00
Azat Khuzhin
2c40dd6a4c Switch Block::NameMap to google::dense_hash_map over HashMap
Since HashMap creates 2^8 elements by default, while dense_hash_map
should be good here.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-12 05:52:57 +02:00
Alexey Milovidov
ef16077c72
Merge branch 'master' into pretty-time-squashing 2023-05-06 18:20:49 +03:00
Alexey Milovidov
90b0de5677 Make Pretty prettier 2023-05-05 06:36:53 +02:00
Michael Kolupaev
3bd1489f18 Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading() 2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0 Better control over Parquet row group size 2023-05-04 14:59:55 -07:00
Nikita Mikhaylov
954e3b724c
Speedup outdated parts loading (#49317) 2023-05-03 18:56:45 +02:00
Kruglov Pavel
bacba6e347
Fix typo 2023-04-26 12:18:12 +02:00
Alexey Milovidov
54d10f87f2 Consistency of the LineAsString format 2023-04-23 05:50:46 +02:00
robot-ch-test-poll1
f466c89621
Merge pull request #48911 from Avogar/parquet-metadata-format
Add ParquetMetadata input format to read Parquet file metadata
2023-04-21 03:46:26 +02:00
avogar
34cc7b635a Fix type name 2023-04-19 10:33:39 +00:00
avogar
8af9cf67fd Fix comments 2023-04-19 10:33:39 +00:00
avogar
c2f18281c8 Make better 2023-04-19 10:33:39 +00:00
avogar
bb6cf5252f Fix logical error with IPv4 in Protobuf, add support for Date32 2023-04-19 10:33:39 +00:00
Kruglov Pavel
9bc95bed85
Merge pull request #48898 from Avogar/pretty-json
Add PrettyJSONEachRow format to output pretty JSON
2023-04-19 12:27:24 +02:00
Kruglov Pavel
a5c52d3bc3
Merge branch 'master' into parquet-metadata-format 2023-04-18 21:51:14 +02:00
avogar
b277a5c943 Add ParquetMetadata input format to read Parquet file metadata 2023-04-18 16:46:26 +00:00
avogar
e356f92b77 Add PrettyJSONEachRow format to output pretty JSON 2023-04-18 13:28:59 +00:00
Michael Kolupaev
87be78e6de Better 2023-04-17 04:58:32 +00:00
Michael Kolupaev
e133633359 Parallel decoding with one row group per thread 2023-04-17 04:58:32 +00:00
Michael Kolupaev
683077890f Highly questionable refactoring (getInputMultistream() nonsense) 2023-04-17 04:58:32 +00:00
Michael Kolupaev
2d4fe85513 Something 2023-04-17 04:58:32 +00:00
Kruglov Pavel
f087f0e877
Update src/Formats/ReadSchemaUtils.cpp 2023-04-11 14:18:16 +02:00
robot-ch-test-poll2
bf003c7595
Merge pull request #48390 from Avogar/protobuf-tuple
Allow write/read unnamed tuple as nested Message in Protobuf format
2023-04-05 22:14:28 +02:00
Kruglov Pavel
bd318950b3
Fix special build 2023-04-05 13:35:12 +02:00
Kruglov Pavel
96a3307bda
Merge branch 'master' into fix-protobuf-abort 2023-04-05 11:57:18 +02:00
avogar
f46f098c78 Better 2023-04-05 09:55:49 +00:00
avogar
04be32216a Allow write/read unnamed tuple as nested Message in Protobuf format 2023-04-04 14:47:37 +00:00
avogar
4894f47d95 Fix tests 2023-04-04 13:34:02 +00:00
avogar
972c680b3c Fix typo 2023-04-03 16:27:09 +00:00
avogar
2cde63a25c Avoid abort in protobuf library in debug build 2023-04-03 16:25:22 +00:00
laimuxi
b869572a54 reformat code 2023-04-01 15:20:26 +08:00
laimuxi
3b756ef026 rollback 2023-03-31 21:58:20 +08:00
laimuxi
17efdbf625 change 2023-03-31 21:56:35 +08:00
avogar
35937adcaa Support more types in CapnProto format 2023-03-30 19:15:28 +00:00
Alexey Milovidov
637f6fdd51 Limit memory in fuzzers 2023-03-19 06:17:55 +01:00
Alexey Milovidov
465a89ba15 Limit memory in fuzzers 2023-03-19 05:55:53 +01:00
Alexey Milovidov
57a5a946c9 Fix error 2023-03-19 05:34:10 +01:00
Alexey Milovidov
ee98b555fb Limit memory in fuzzers 2023-03-19 05:11:32 +01:00
Alexey Milovidov
2a077f11f6 Merge branch 'master' into fuzzer-of-data-formats 2023-03-19 01:07:31 +01:00
Alexey Milovidov
2bffed06de Fix style 2023-03-17 18:35:19 +01:00
Alexey Milovidov
1abe5ea58e Add data type fuzzer 2023-03-17 04:44:14 +01:00
Alexey Milovidov
6275c472a7 Better exceptions 2023-03-17 03:14:49 +01:00
avogar
2cc47b5bb6 Allow reading/writing nested arrays in Protobuf with only root field name as column name 2023-03-16 14:43:37 +00:00
Alexey Milovidov
bb6b775884 Merge branch 'master' into fuzzer-of-data-formats 2023-03-15 12:42:00 +01:00
Alexey Milovidov
e443c4e682
Merge pull request #47538 from Avogar/proper-parquet-fix
Proper fix for bug in parquet, revert reverted #45878
2023-03-14 22:29:39 +03:00
Michael Kolupaev
d3a514d221 Compress marks in memory 2023-03-13 16:29:00 -07:00
Alexey Milovidov
f331b9b398 Fix errors and add tests 2023-03-13 23:49:28 +01:00
Alexey Milovidov
14647525f8 Merge branch 'fix-bson-bug' of github.com:Avogar/ClickHouse into fuzzer-of-data-formats 2023-03-13 22:45:00 +01:00
avogar
4213ec609f Proper fix for bug in parquet, revert reverted #45878 2023-03-13 18:22:09 +00:00
Alexey Milovidov
1fd24c212b Update comment 2023-03-13 07:42:58 +01:00
Alexey Milovidov
02f7ef4723 Update comment 2023-03-13 05:28:06 +01:00
Alexey Milovidov
43b938d303 Update the fuzzer 2023-03-13 05:21:48 +01:00
Alexey Milovidov
f33b651686 Add fuzzer for data formats 2023-03-13 04:51:50 +01:00
avogar
5a18acde90 Revert #45878 and add a test 2023-03-11 21:15:14 +00:00
Kruglov Pavel
f387e6013a
Merge pull request #46990 from Avogar/native-types-conversions
Allow types conversion in Native input format
2023-03-10 16:55:16 +01:00
Alexey Milovidov
6f35d46ac8
Update SchemaInferenceUtils.cpp 2023-03-10 05:01:06 +03:00
avogar
46979e383f Fix big numbers inference in CSV 2023-03-09 18:21:47 +00:00
Kruglov Pavel
fe973f3d6f
Merge branch 'master' into native-types-conversions 2023-03-09 13:03:25 +01:00
Kruglov Pavel
71b6d6c6ae
Merge pull request #47114 from Avogar/parquet-compression
Improve working with compression methods in Parquet/ORC/Arrow formats
2023-03-09 13:02:18 +01:00
Mike Kot
9920a52c51 use std::lerp, constexpr hex.h 2023-03-07 22:50:17 +00:00
Kruglov Pavel
69a1309ade
Merge branch 'master' into native-types-conversions 2023-03-07 20:06:17 +01:00
Kruglov Pavel
479cd9b90b
Merge pull request #46972 from Avogar/json-date-int-inference
Fix date and int inference from string in JSON
2023-03-06 20:40:38 +01:00
Kruglov Pavel
3de905bb7c
Merge pull request #46616 from Avogar/fix-ipv4-ipv6-formats
Fix IPv4/IPv6 serialization/deserialization in binary formats
2023-03-06 19:40:29 +01:00
avogar
5ab5902f38 Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats 2023-03-01 21:27:46 +00:00
Kruglov Pavel
65f06fc9b1
Merge branch 'master' into json-date-int-inference 2023-02-28 14:31:57 +01:00
avogar
ab899bf2f3 Allow types conversion in Native input format 2023-02-27 19:28:19 +00:00
avogar
2e921e3d6b Fix date and int inference from string in JSON 2023-02-27 16:00:19 +00:00
Kruglov Pavel
443dedddca
Merge branch 'master' into use-parquet-2 2023-02-27 14:31:43 +01:00
Kruglov Pavel
47f9ca2166
Merge branch 'master' into fix-ipv4-ipv6-formats 2023-02-23 20:32:43 +01:00
avogar
eec6051a50 style 2023-02-23 16:16:08 +00:00
avogar
54622566df Add setting to change parquet version 2023-02-23 16:14:10 +00:00
Kruglov Pavel
ef0d6becba
Merge branch 'master' into null-as-default-all-formats 2023-02-21 16:52:39 +01:00
Kruglov Pavel
b0424c1021
Merge pull request #46171 from Avogar/insert-null-as-default
Use default of column type in `insert_null_as_default` if column DEFAULT values is not specified
2023-02-20 21:45:02 +01:00
Kruglov Pavel
9866ecfe8b
Merge branch 'master' into null-as-default-all-formats 2023-02-20 20:49:30 +01:00
avogar
8da3594cd8 Fix IPv4/IPv6 serialization/deserialization in binary formats 2023-02-20 17:42:56 +00:00
Alexey Milovidov
d8cda3dbb8 Remove PVS-Studio 2023-02-19 23:30:05 +01:00
Kruglov Pavel
9fd2226c4c
Update NativeReader.h 2023-02-15 15:13:04 +01:00
Geoff Genz
be8bf3a6a3
Merge branch 'master' into http_client_version 2023-02-13 08:43:59 -07:00
avogar
d1efd02480 Extend setting input_format_null_as_default for more formats 2023-02-10 16:41:09 +00:00
Geoff Genz
99c3ff53c5 Merge remote-tracking branch 'origin/master' into http_client_version
# Conflicts:
#	src/Interpreters/Context.cpp
#	src/Interpreters/Context.h
2023-02-10 04:35:53 -07:00
Geoff Genz
7ed8ed0284 Add support for client_protocol_version sent with HTTP 2023-02-10 03:47:06 -07:00
avogar
c3e8dd8984 Fix low cardinality case 2023-02-08 19:14:28 +00:00
Kruglov Pavel
4e2918cee3
Merge branch 'master' into parquet-fixed-binary 2023-02-08 12:31:13 +01:00
Antonio Andelic
a39e4e24c6
Merge branch 'master' into optimize_parquet_reader 2023-02-02 14:18:00 +01:00
Vladimir C
7c6281c446
Merge pull request #45581 from Avogar/fix-date-inference 2023-02-01 13:04:12 +01:00
liuneng
17fc22a21e add parquet max_block_size setting 2023-02-01 18:29:20 +08:00
Alexey Milovidov
04078dbed3 Remove trash 2023-01-29 22:43:36 +01:00
Kruglov Pavel
96700abbe1
Merge pull request #45678 from azat/formats/json-parse-tupels
Add ability to ignore unknown keys in JSON object for named tuples
2023-01-27 21:11:05 +01:00
Azat Khuzhin
1a8437f2c9 Add ability to ignore unknown keys in JSON object for named tuples
This can be useful in case your input JSON is complex, while you need
only few fields in it.

This behaviour is controlled by the
input_format_json_ignore_unknown_keys_in_named_tuple setting name, that
is turned OFF by default.

This will, almost, allow to parse gharchive dataset without jq. "almost"
because of two things:
- Tuple cannot be Nullable, so such keys with Tuple type in ClickHouse
  cannot be `null` in JSON
- You cannot use dot.dot notation to extract columns for file() engine,
  only tupleElement()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-27 10:01:08 +01:00
Alexander Tokmakov
d1baa7300c reformat ParsingException 2023-01-24 23:21:29 +01:00
Alexander Tokmakov
dd57215934 Merge branch 'master' into exception_message_patterns4 2023-01-24 17:03:12 +01:00
Kruglov Pavel
23c12ac8ee
Merge branch 'master' into parquet-fixed-binary 2023-01-24 16:51:05 +01:00
avogar
7eeb2a0bc7 Change comment 2023-01-24 15:46:32 +00:00
avogar
159f49266e Don't infer Dates from 8 digit numbers 2023-01-24 15:45:27 +00:00
Kruglov Pavel
cd1cd904a7
Merge branch 'master' into tsv-csv-detect-header 2023-01-23 23:49:56 +01:00