Commit Graph

1051 Commits

Author SHA1 Message Date
robot-clickhouse
926c5636dd
Merge pull request #42599 from ClickHouse/build-fuzzer-protocol
libFuzzer: add CI fuzzers build, add tcp protocol fuzzer, fix other fuzzers.
2023-09-04 22:41:54 +02:00
slvrtrn
bb0eff9669 Revert format changes 2023-09-04 21:15:26 +02:00
Yakov Olkhovskiy
361b21b416 fix fuzzers, cmake refactor, add target fuzzers 2023-09-01 14:20:50 +00:00
Robert Schulze
aefb543734
Merge remote-tracking branch 'rschu1ze/master' into feat_markdown 2023-08-31 11:32:44 +00:00
slvrtrn
8378441248 Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql 2023-08-30 19:00:51 +02:00
Alexey Milovidov
bbef3ceeb0
Merge pull request #53902 from HarryLeeIBM/hlee-s390x-stripe-log
Fix StripeLog storage endian issue for s390x
2023-08-29 00:35:48 +03:00
HarryLeeIBM
dcecf52a68 Fix StripeLog storage endian issue for s390x 2023-08-28 11:35:04 -07:00
irenjj
51aa89eed8 Add a setting to automatically escape special characters in Markdown. 2023-08-28 00:10:33 +08:00
slvrtrn
055d2e3c3d Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql 2023-08-25 21:27:47 +02:00
slvrtrn
734ffd916c WIP prepared statements 2023-08-25 20:31:21 +02:00
Yakov Olkhovskiy
415a993c91 fix fuzzers build 2023-08-24 23:22:39 +00:00
Kruglov Pavel
f7e1abd774
Merge branch 'master' into cache-count 2023-08-23 22:31:49 +02:00
Kruglov Pavel
67c5c0203b
Merge branch 'master' into fast-count-from-files 2023-08-22 15:03:48 +02:00
Kruglov Pavel
c0bdd0e00b
Merge branch 'master' into cache-count 2023-08-22 14:42:22 +02:00
avogar
b4145aeddc Cache number of rows in files for count in file/s3/url/hdfs/azure functions 2023-08-22 11:59:59 +00:00
Michael Kolupaev
2f4d433e69 Parquet filter pushdown 2023-08-21 14:15:52 -07:00
Michael Kolupaev
6009e1b293
Merge pull request #53324 from bigo-sg/ch_gluten_2583
Implement native orc input format without arrow to improve performance
2023-08-21 13:44:57 -07:00
Kruglov Pavel
88aee95122
Merge branch 'master' into fast-count-from-files 2023-08-21 14:46:33 +02:00
avogar
47304bf7aa Optimize count from files in most input formats 2023-08-21 12:30:52 +00:00
Kruglov Pavel
c68456a20a
Merge pull request #52692 from Avogar/variable-number-of-volumns-more-formats
Allow variable number of columns in more formats, make it work with schema inference
2023-08-21 13:28:35 +02:00
taiyang-li
f723e8d43a change as request 2023-08-21 12:09:02 +08:00
Michael Kolupaev
a1522e22ea
Merge pull request #53281 from Avogar/batch-small-parquet-row-groups
Optimize reading small row groups by batching them together in Parquet
2023-08-18 17:15:42 -07:00
avogar
bca91548ad Add setting input_format_parquet_local_file_min_bytes_for_seek 2023-08-17 12:28:01 +00:00
Alexander Tokmakov
ba44d7260e fix 2023-08-16 00:20:28 +02:00
avogar
7e863a2726 Address comments 2023-08-11 13:17:49 +00:00
avogar
3ad7e57059 Optimize reading small row groups by batching them together in Parquet 2023-08-11 13:17:45 +00:00
Kruglov Pavel
b6b0e9c6bc
Merge branch 'master' into variable-number-of-volumns-more-formats 2023-08-11 14:02:21 +02:00
Kruglov Pavel
00865a7dad
Merge branch 'master' into format-one 2023-08-11 13:54:58 +02:00
Kruglov Pavel
6600f87f86
Merge branch 'master' into http-valid-json-on-exception 2023-08-10 13:53:32 +02:00
Kruglov Pavel
33a39900ad
Merge branch 'master' into variable-number-of-volumns-more-formats 2023-08-09 19:51:17 +02:00
avogar
01a7c7560f Add input format One 2023-08-09 11:25:32 +00:00
Alexey Milovidov
29221188ba Fix error 2023-08-09 04:07:31 +02:00
Alexey Milovidov
5dd99db369 Add diagnostic info about file name during schema inference 2023-08-08 03:55:06 +02:00
Anton Popov
ff137773e7
Merge branch 'master' into formats-with-subcolumns 2023-08-02 15:24:56 +02:00
Alexey Milovidov
3af4fd4003
Merge branch 'master' into change-protocol-version 2023-08-02 15:44:08 +03:00
Anton Popov
aec0667f16 better check of version for sparse serialization 2023-08-01 23:39:52 +00:00
avogar
fa905ebd27 Clean up 2023-08-01 10:14:09 +00:00
avogar
a71cd56a90 Output valid JSON/XML on excetpion during HTTP query execution 2023-08-01 10:06:56 +00:00
Anton Popov
525da38316 increase min protocol version for sparse serialization 2023-07-31 18:58:58 +00:00
Kruglov Pavel
3e1c409e60
Merge branch 'master' into structure-to-schema 2023-07-28 11:32:16 +02:00
avogar
6d77d52dfe Allow variable number of columns in TSV/CuatomSeprarated/JSONCompactEachRow, make schema inference work with variable number of columns 2023-07-27 18:02:29 +00:00
Kruglov Pavel
0cd2d7449b
Use for-range loop, add comment 2023-07-27 00:01:25 +02:00
Alexey Milovidov
bc86c26e4e
Update src/Formats/StructureToFormatSchemaUtils.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2023-07-26 23:37:20 +03:00
Kruglov Pavel
0d34e97dbe
Merge branch 'master' into formats-with-subcolumns 2023-07-26 13:30:35 +02:00
Michael Kolupaev
8184a289e5 Partially reimplement Parquet encoder to make it faster and parallelizable 2023-07-25 10:16:28 +00:00
Igor Nikonov
2d33661594
Merge branch 'master' into fix-potentially-bad-code 2023-07-22 22:48:07 +02:00
avogar
da6a31bb62 Fix tests and style 2023-07-19 13:26:09 +00:00
Kruglov Pavel
f0026af189
Revert "Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed"" 2023-07-19 14:51:11 +02:00
Kruglov Pavel
7b3564f96a
Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed" 2023-07-19 14:44:59 +02:00
robot-ch-test-poll4
63d0616a22
Merge pull request #51716 from KevinyhZou/bug_fix_csv_field_type_not_match
Improve CSVInputFormat to check and set default value to column if deserialize failed
2023-07-19 14:41:05 +02:00
kevinyhzou
95424177d5 review fix 2023-07-19 18:26:54 +08:00
Alexey Milovidov
8cd2e7c7d6 Merge branch 'master' into fix-potentially-bad-code 2023-07-18 22:18:22 +02:00
avogar
b300781fd8 Make better, add tests 2023-07-18 17:48:39 +00:00
avogar
67f340b501 Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema 2023-07-18 13:52:15 +00:00
Kruglov Pavel
1dd05319b5
Merge branch 'master' into formats-with-subcolumns 2023-07-17 19:13:42 +02:00
kevinyhzou
355faa4251 ci fix 2023-07-17 20:08:32 +08:00
robot-clickhouse-ci-2
ac3cc1c2ff
Merge pull request #45671 from ClibMouse/feature/interval-kql-style-formatting
Implement KQL-style formatting for Interval
2023-07-16 04:06:54 +02:00
kevinyhzou
b2665031dc review fix 2023-07-13 20:27:14 +08:00
kevinyhzou
ba57c84db3 bug fix csv input field type mismatch 2023-07-13 20:24:10 +08:00
Alexander Gololobov
9757e272b9 Check number of rows in the reader instead 2023-07-11 12:24:16 +02:00
ltrk2
2d2debe3ce Introduce a separate setting for interval output formatting 2023-07-10 13:51:49 -04:00
ltrk2
b673aa8e6b Use the dialect configuration 2023-07-10 13:51:49 -04:00
ltrk2
522b9ebf8c Implement KQL-style formatting for Interval 2023-07-10 13:51:49 -04:00
Dmitry Kardymon
32f5a78302 Fix setting name 2023-07-06 07:32:46 +00:00
Dmitry Kardymon
24b5c9c204 Use one setting input_format_csv_allow_variable_number_of_colums and code in RowInput 2023-07-06 06:05:43 +00:00
avogar
98aa6b317f Support reading subcolumns from file/s3/hdfs/url/azureBlobStorage table functions 2023-07-04 21:17:26 +00:00
Dmitry Kardymon
ab4142eb8f Merge remote-tracking branch 'clickhouse/master' into ADQM-870 2023-07-04 08:23:31 +03:00
Alexey Milovidov
27f41869a9 Remove code that I don't like 2023-06-25 09:11:42 +02:00
avogar
03f820bc4a Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema 2023-06-22 18:46:01 +00:00
avogar
4060beae49 Structure to CapnProto/Protobuf schema take 1 2023-06-22 18:00:00 +00:00
Michael Kolupaev
2498170253 Fix use-after-free in StorageURL when switching URLs 2023-06-22 16:24:12 +00:00
Dmitry Kardymon
30bea857fd Merge remote-tracking branch 'origin/master' into ADQM-870 2023-06-19 07:19:07 +00:00
Kruglov Pavel
11f176dd19
Merge pull request #50712 from KevinyhZou/bug_fix_csv_parse_by_tab_delimiter
Support CSVInputFormat to read csv file by whitespace & tab delimiter
2023-06-16 13:16:22 +02:00
Dmitry Kardymon
806176d88e Add input_format_csv_missing_as_default setting and tests 2023-06-15 11:23:08 +00:00
KevinyhZou
953f40aa3b
Merge branch 'master' into bug_fix_csv_parse_by_tab_delimiter 2023-06-15 10:25:19 +08:00
Dmitry Kardymon
a91fc3ddb3 Add docs/ add more cases in test 2023-06-14 16:44:31 +00:00
Dmitry Kardymon
ed318d1035 Add input_format_csv_ignore_extra_columns setting (prototype) 2023-06-14 10:35:36 +00:00
Chang Chen
86694847c6 using Reader instead of typename CapnpType::Reader 2023-06-14 15:22:32 +08:00
Chang Chen
e281026e00 fix build issue on clang 15 2023-06-14 12:29:55 +08:00
kevinyhzou
f3b99156ac review fix 2023-06-14 10:48:21 +08:00
Kruglov Pavel
607f337d67
Merge pull request #50592 from Avogar/max-bytes-to-read-in-schema-inference
Add setting to limit the number of bytes to read in schema inference
2023-06-13 16:47:57 +02:00
Kruglov Pavel
8fdcd91c38
Merge pull request #49752 from Avogar/better-capnproto-3
Refactor CapnProto format to improve input/output performance
2023-06-13 16:20:38 +02:00
Kruglov Pavel
24d70a2afd
Fix 2023-06-12 13:37:59 +02:00
kevinyhzou
911f8ad8dc use whitespace or tab as field delimiter 2023-06-12 11:57:52 +08:00
avogar
47b0c2a862 Make better 2023-06-09 13:01:36 +00:00
kevinyhzou
48e1b21aab Add feature to support read csv by space & tab delimiter 2023-06-08 20:34:30 +08:00
avogar
cc036528fe Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-capnproto-3 2023-06-08 11:16:13 +00:00
Kruglov Pavel
1baa6404e6
Merge branch 'master' into skip-trailing-empty-lines 2023-06-06 19:39:34 +02:00
avogar
df50833b70 Allow to skip trailing empty lines in CSV/TSV/CustomeSeparated formats 2023-06-06 17:33:05 +00:00
Kruglov Pavel
af880a6f3b
Merge branch 'master' into max-bytes-to-read-in-schema-inference 2023-06-06 14:47:58 +02:00
Nikita Mikhaylov
e87348010d
Rework loading and removing of data parts for MergeTree tables. (#49474)
Co-authored-by: Sergei Trifonov <sergei@clickhouse.com>
2023-06-06 14:42:56 +02:00
avogar
33e51d4f3b Add setting to limit the number of bytes to read in schema inference 2023-06-05 15:22:04 +00:00
Kruglov Pavel
59f27014f7
Merge pull request #50474 from valentinalexeev/patch-1
Additional error information when JSON is too large
2023-06-05 13:31:21 +02:00
Alexey Gerasimchuk
9958731c27
Merge branch 'master' into ADQM-830 2023-06-05 07:46:47 +10:00
Valentin Alexeev
516cda94ee Use in.count() instead of pos 2023-06-02 18:42:35 +02:00
Valentin Alexeev
da4d55cdaf Additional error information when JSON is too large
If a parser fails on a large JSON, then output the last position processed to allow review.
2023-06-02 18:42:35 +02:00
Michael Kolupaev
b51064a508 Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead 2023-06-01 18:48:30 -07:00
avogar
883350d5c2 Fix tests 2023-06-01 14:51:14 +00:00
Kruglov Pavel
8b34a30455
Fix style 2023-05-31 22:14:57 +02:00
Kruglov Pavel
898d1f34db
Merge branch 'master' into better-capnproto-3 2023-05-31 21:44:00 +02:00
avogar
c9626314f7 Better 2023-05-31 19:22:44 +00:00
Kruglov Pavel
2dd4701115
Merge branch 'master' into allow_empty 2023-05-30 16:04:12 +02:00
avogar
ea395e9554 Make better 2023-05-25 15:24:02 +00:00
Alexey Gerasimchuck
75791d7a63 Added input_format_csv_trim_whitespaces parameter 2023-05-25 07:51:32 +00:00
Kruglov Pavel
f76fc5e066 Fix special build 2023-05-24 17:19:04 +00:00
Kruglov Pavel
94ef08977a Fix special build 2023-05-24 17:19:04 +00:00
Kruglov Pavel
5f1ca61d09 Fix special builds 2023-05-24 17:19:04 +00:00
avogar
a89a8b8d50 Fix build 2023-05-24 17:19:04 +00:00
Kruglov Pavel
c2eada7ba7 Fix style 2023-05-24 17:19:04 +00:00
avogar
e66f6272d1 Refactor CapnProto format to improve input/output performance 2023-05-24 17:19:04 +00:00
Michael Kolupaev
6fd5d8e8ba Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files 2023-05-19 18:39:50 +00:00
Yakov Olkhovskiy
0a44a69dc8 remove unnecessary header 2023-05-17 00:22:13 +00:00
Yakov Olkhovskiy
282297b677 binary encoding of IPv6 in protobuf 2023-05-16 23:46:01 +00:00
Kruglov Pavel
5ada385502
Merge branch 'master' into allow_empty 2023-05-16 12:21:31 +02:00
Kruglov Pavel
558eda4146
Merge pull request #49412 from azat/block-use-dense-hash-map
Switch Block::NameMap to google::dense_hash_map over HashMap
2023-05-15 12:22:55 +02:00
Alexey Milovidov
0ca36d4f89 Merge branch 'master' into clang-17 2023-05-14 01:57:40 +02:00
Alexey Milovidov
5a44dc26e7 Fixes for clang-17 2023-05-13 02:57:31 +02:00
Alexey Milovidov
f6144ee32b
Revert "Make Pretty formats even prettier." 2023-05-13 02:45:07 +03:00
Azat Khuzhin
2c40dd6a4c Switch Block::NameMap to google::dense_hash_map over HashMap
Since HashMap creates 2^8 elements by default, while dense_hash_map
should be good here.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-12 05:52:57 +02:00
Alexey Milovidov
ef16077c72
Merge branch 'master' into pretty-time-squashing 2023-05-06 18:20:49 +03:00
Alexey Milovidov
90b0de5677 Make Pretty prettier 2023-05-05 06:36:53 +02:00
Michael Kolupaev
3bd1489f18 Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading() 2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0 Better control over Parquet row group size 2023-05-04 14:59:55 -07:00
Nikita Mikhaylov
954e3b724c
Speedup outdated parts loading (#49317) 2023-05-03 18:56:45 +02:00
Kruglov Pavel
bacba6e347
Fix typo 2023-04-26 12:18:12 +02:00
Alexey Milovidov
54d10f87f2 Consistency of the LineAsString format 2023-04-23 05:50:46 +02:00
robot-ch-test-poll1
f466c89621
Merge pull request #48911 from Avogar/parquet-metadata-format
Add ParquetMetadata input format to read Parquet file metadata
2023-04-21 03:46:26 +02:00
avogar
34cc7b635a Fix type name 2023-04-19 10:33:39 +00:00
avogar
8af9cf67fd Fix comments 2023-04-19 10:33:39 +00:00
avogar
c2f18281c8 Make better 2023-04-19 10:33:39 +00:00
avogar
bb6cf5252f Fix logical error with IPv4 in Protobuf, add support for Date32 2023-04-19 10:33:39 +00:00
Kruglov Pavel
9bc95bed85
Merge pull request #48898 from Avogar/pretty-json
Add PrettyJSONEachRow format to output pretty JSON
2023-04-19 12:27:24 +02:00
Kruglov Pavel
a5c52d3bc3
Merge branch 'master' into parquet-metadata-format 2023-04-18 21:51:14 +02:00
avogar
b277a5c943 Add ParquetMetadata input format to read Parquet file metadata 2023-04-18 16:46:26 +00:00
avogar
e356f92b77 Add PrettyJSONEachRow format to output pretty JSON 2023-04-18 13:28:59 +00:00
Michael Kolupaev
87be78e6de Better 2023-04-17 04:58:32 +00:00
Michael Kolupaev
e133633359 Parallel decoding with one row group per thread 2023-04-17 04:58:32 +00:00
Michael Kolupaev
683077890f Highly questionable refactoring (getInputMultistream() nonsense) 2023-04-17 04:58:32 +00:00
Michael Kolupaev
2d4fe85513 Something 2023-04-17 04:58:32 +00:00
Kruglov Pavel
f087f0e877
Update src/Formats/ReadSchemaUtils.cpp 2023-04-11 14:18:16 +02:00
robot-ch-test-poll2
bf003c7595
Merge pull request #48390 from Avogar/protobuf-tuple
Allow write/read unnamed tuple as nested Message in Protobuf format
2023-04-05 22:14:28 +02:00
Kruglov Pavel
bd318950b3
Fix special build 2023-04-05 13:35:12 +02:00
Kruglov Pavel
96a3307bda
Merge branch 'master' into fix-protobuf-abort 2023-04-05 11:57:18 +02:00
avogar
f46f098c78 Better 2023-04-05 09:55:49 +00:00
avogar
04be32216a Allow write/read unnamed tuple as nested Message in Protobuf format 2023-04-04 14:47:37 +00:00
avogar
4894f47d95 Fix tests 2023-04-04 13:34:02 +00:00
avogar
972c680b3c Fix typo 2023-04-03 16:27:09 +00:00
avogar
2cde63a25c Avoid abort in protobuf library in debug build 2023-04-03 16:25:22 +00:00
laimuxi
b869572a54 reformat code 2023-04-01 15:20:26 +08:00
laimuxi
3b756ef026 rollback 2023-03-31 21:58:20 +08:00