Commit Graph

1846 Commits

Author SHA1 Message Date
Michael Kolupaev
a1522e22ea
Merge pull request #53281 from Avogar/batch-small-parquet-row-groups
Optimize reading small row groups by batching them together in Parquet
2023-08-18 17:15:42 -07:00
kothiga
f33c585bc5
Addressing feedback. 2023-08-18 13:50:31 -07:00
Austin Kothig
6b42975d33
Change BE-UUID to work the same as LE-UUID. Included high and low getters to provide cleaner code when accessing undertype. 2023-08-18 08:19:46 -07:00
Kruglov Pavel
3bae63e48e
Fix special build 2023-08-18 13:56:32 +02:00
taiyang-li
4f9429d2e4 fix ut tests/queries/0_stateless/00900_null_array_orc_load.sh 2023-08-18 17:58:24 +08:00
taiyang-li
15720d9cef fix ut tests/queries/0_stateless/02518_parquet_arrow_orc_boolean_value.sh 2023-08-18 15:10:25 +08:00
taiyang-li
8e0d5b7ee0 fix bugs 2023-08-18 13:31:26 +08:00
avogar
bca91548ad Add setting input_format_parquet_local_file_min_bytes_for_seek 2023-08-17 12:28:01 +00:00
taiyang-li
c4777397f7 fix integration test test_kafka_formats 2023-08-17 14:42:51 +08:00
taiyang-li
c2017e0ea3 update orc version 2023-08-15 14:49:45 +08:00
taiyang-li
bfa9d361cc fix tests 2023-08-15 12:16:22 +08:00
ltrk2
2846ea49b4
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-08-14 13:02:30 -04:00
taiyang-li
bbe5caa9dd fix building 2023-08-14 19:55:45 +08:00
李扬
613bef89f4
Merge branch 'master' into ch_gluten_2583 2023-08-14 17:44:15 +08:00
taiyang-li
34c3162c7c revert old orc files 2023-08-14 17:42:42 +08:00
taiyang-li
8ee1912de6 remove old orc files 2023-08-14 17:26:24 +08:00
taiyang-li
57bef64fbc add new native orc files 2023-08-14 17:25:58 +08:00
avogar
7e863a2726 Address comments 2023-08-11 13:17:49 +00:00
avogar
3ad7e57059 Optimize reading small row groups by batching them together in Parquet 2023-08-11 13:17:45 +00:00
Kruglov Pavel
b6b0e9c6bc
Merge branch 'master' into variable-number-of-volumns-more-formats 2023-08-11 14:02:21 +02:00
Kruglov Pavel
00865a7dad
Merge branch 'master' into format-one 2023-08-11 13:54:58 +02:00
ltrk2
1dc1b54c68
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-08-11 07:48:49 -04:00
Alexey Milovidov
fd7b92e90a
Merge pull request #53135 from ClickHouse/file_diagnostics_while_reading_header
Add diagnostic info about file name during schema inference
2023-08-10 21:56:12 +03:00
ltrk2
a2054c04dd
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-08-10 10:21:34 -04:00
avogar
078683e226 Fix tests 2023-08-10 13:07:06 +00:00
Kruglov Pavel
6600f87f86
Merge branch 'master' into http-valid-json-on-exception 2023-08-10 13:53:32 +02:00
avogar
82aff97dd0 Add comment, more test 2023-08-10 11:51:36 +00:00
Kruglov Pavel
bb38918a26
Apply suggestions from code review
Co-authored-by: János Benjamin Antal <antaljanosbenjamin@users.noreply.github.com>
2023-08-10 13:21:11 +02:00
Kruglov Pavel
33a39900ad
Merge branch 'master' into variable-number-of-volumns-more-formats 2023-08-09 19:51:17 +02:00
ltrk2
139e9433a8
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-08-09 09:48:00 -04:00
Kruglov Pavel
70659e9721
Fix style 2023-08-09 15:07:49 +02:00
avogar
01a7c7560f Add input format One 2023-08-09 11:25:32 +00:00
Alexey Milovidov
5561e3e198 Remove garbage and speed up Debug and Tidy builds 2023-08-09 01:44:39 +02:00
Alexey Milovidov
5dd99db369 Add diagnostic info about file name during schema inference 2023-08-08 03:55:06 +02:00
alexX512
520a3c6eeb Merge branch 'master' of github.com:ClickHouse/ClickHouse 2023-08-07 16:52:51 +00:00
Michael Kolupaev
4ed86fea2f Fix Parquet stats for Float32 and Float64 2023-08-04 21:01:07 +00:00
alexX512
0d84226914 Merge branch 'master' of github.com:ClickHouse/ClickHouse 2023-08-02 19:18:59 +00:00
alexX512
06e187e58a Split mergeSortingTransform and mergeSortingPartialResultTransform into 2 files 2023-08-02 18:15:25 +00:00
ltrk2
27a2d4d1c7
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-08-02 11:36:43 -04:00
Anton Popov
ff137773e7
Merge branch 'master' into formats-with-subcolumns 2023-08-02 15:24:56 +02:00
Kruglov Pavel
81866bcc9c
Fix special build 2023-08-02 12:35:58 +02:00
avogar
d12e96177a Fix tests 2023-08-01 16:17:03 +00:00
Kruglov Pavel
23aab71d7c
Merge branch 'master' into http-valid-json-on-exception 2023-08-01 16:47:31 +02:00
Kruglov Pavel
8f6526a930
Merge branch 'master' into structure-to-schema 2023-08-01 16:22:14 +02:00
alexX512
8e3296a44a Add support of local progress in scheduler 2023-08-01 13:31:18 +00:00
avogar
fa905ebd27 Clean up 2023-08-01 10:14:09 +00:00
avogar
a71cd56a90 Output valid JSON/XML on excetpion during HTTP query execution 2023-08-01 10:06:56 +00:00
ltrk2
e869adf645 Improve function naming 2023-07-31 06:48:50 -07:00
Alexey Milovidov
caa4590361 Merge branch 'master' into check-for-hiding-cyrillic-characters 2023-07-30 02:16:56 +02:00
ltrk2
6c9a1b14ef
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-07-28 16:18:46 -04:00
avogar
c3c64a7dd5 Fix 2023-07-28 11:40:05 +00:00
Kruglov Pavel
3e1c409e60
Merge branch 'master' into structure-to-schema 2023-07-28 11:32:16 +02:00
avogar
6d77d52dfe Allow variable number of columns in TSV/CuatomSeprarated/JSONCompactEachRow, make schema inference work with variable number of columns 2023-07-27 18:02:29 +00:00
Antonio Andelic
f61f36800c Fix style 2023-07-27 08:48:23 +00:00
Alexey Milovidov
65ffe91bf2 Fix double whitespace 2023-07-27 07:13:26 +02:00
Alexey Milovidov
6aab4cc835 Check for unexpected cyrillic 2023-07-27 05:25:40 +02:00
Kruglov Pavel
fab77783f1
Merge pull request #49367 from ClickHouse/enc
Partially reimplement Parquet encoder to make it faster and parallelizable
2023-07-27 00:48:54 +02:00
Kruglov Pavel
0d34e97dbe
Merge branch 'master' into formats-with-subcolumns 2023-07-26 13:30:35 +02:00
Kruglov Pavel
15cc046883
Merge branch 'master' into better-progress-bar-2 2023-07-26 13:12:24 +02:00
Michael Kolupaev
5ee71bd643 Work around the clang bug 2023-07-25 10:26:26 +00:00
Michael Kolupaev
dfdf5de972 Fixes 2023-07-25 10:16:28 +00:00
Michael Kolupaev
db5cb96050 Start over when falling back to non-dictionary encoding 2023-07-25 10:16:28 +00:00
Michael Kolupaev
8184a289e5 Partially reimplement Parquet encoder to make it faster and parallelizable 2023-07-25 10:16:28 +00:00
alexX512
1fe2f052e3 Merge branch 'master' of github.com:ClickHouse/ClickHouse 2023-07-25 09:36:44 +00:00
Alexey Milovidov
21382afa2b Check for punctuation 2023-07-25 06:10:04 +02:00
Alexey Milovidov
ecdafeaf83
Merge pull request #52175 from bigo-sg/comment_improve_ch_to_arrow
Add comments for https://github.com/ClickHouse/ClickHouse/pull/52112
2023-07-25 06:47:48 +03:00
Kruglov Pavel
fec5675cd4
Merge branch 'master' into better-progress-bar-2 2023-07-24 19:59:38 +02:00
李扬
e8fe485330
Merge branch 'master' into comment_improve_ch_to_arrow 2023-07-22 11:43:32 +08:00
Alexey Milovidov
9ae685975e
Merge branch 'master' into avro-fix 2023-07-22 04:56:48 +03:00
alexX512
c403f56e09 Merge branch 'master' of github.com:ClickHouse/ClickHouse 2023-07-21 17:56:53 +00:00
Alexander Tokmakov
b45c2c939b
disable expression templates for time intervals (#52335) 2023-07-21 15:17:07 +03:00
ltrk2
90a2c460c6
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-07-21 08:07:18 -04:00
Kruglov Pavel
342400d0b3
Merge branch 'master' into revert-52322-revert-51716-bug_fix_csv_field_type_not_match 2023-07-20 12:39:38 +02:00
Nikolay Degterinsky
209429d0e3
Merge pull request #49664 from ilejn/test_for_basic_auth_registry
Basic auth to fetch Avro schema in Kafka
2023-07-20 10:58:11 +02:00
李扬
68bf4c3590
Merge branch 'master' into comment_improve_ch_to_arrow 2023-07-20 10:10:47 +08:00
ltrk2
a753c3c6ad
Merge branch 'master' into feature/mergetree-checksum-big-endian-support 2023-07-19 16:22:58 -04:00
Kruglov Pavel
0fca64ced4
Merge pull request #51695 from Avogar/row-binary-with-defaults
Add RowBinaryWithDefaults format
2023-07-19 22:10:30 +02:00
ltrk2
ba4072f049 Adapt changes around SipHash 2023-07-19 10:01:58 -07:00
ltrk2
51e2c58a53 Implement endianness-independent SipHash and MergeTree checksum serialization 2023-07-19 10:01:55 -07:00
Kruglov Pavel
f0026af189
Revert "Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed"" 2023-07-19 14:51:11 +02:00
Kruglov Pavel
7b3564f96a
Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed" 2023-07-19 14:44:59 +02:00
robot-ch-test-poll4
63d0616a22
Merge pull request #51716 from KevinyhZou/bug_fix_csv_field_type_not_match
Improve CSVInputFormat to check and set default value to column if deserialize failed
2023-07-19 14:41:05 +02:00
kevinyhzou
dcf7ba2534 remove unuseful code 2023-07-19 19:36:19 +08:00
kevinyhzou
95424177d5 review fix 2023-07-19 18:26:54 +08:00
Ilya Golshtein
c1c5ffa309 test_for_basic_auth_registry - cpp code small improvement 2023-07-19 08:32:45 +00:00
dheerajathrey
8e1de7897a indentation fix 2023-07-19 08:32:44 +00:00
dheerajathrey
1564eace38 enable url-encoded basic auth to fetch avro schema in kafka 2023-07-19 08:32:44 +00:00
Alexey Milovidov
0789f388c3
Update ArrowFieldIndexUtil.h 2023-07-19 02:45:56 +03:00
Alexey Milovidov
6d915042a2 Fix ugly code 2023-07-19 01:44:20 +02:00
avogar
67f340b501 Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema 2023-07-18 13:52:15 +00:00
Kruglov Pavel
64e88cde21
Merge branch 'master' into better-progress-bar-2 2023-07-18 13:37:53 +02:00
Kruglov Pavel
1e616e17ab
Merge branch 'master' into row-binary-with-defaults 2023-07-17 19:13:57 +02:00
Kruglov Pavel
1dd05319b5
Merge branch 'master' into formats-with-subcolumns 2023-07-17 19:13:42 +02:00
kevinyhzou
355faa4251 ci fix 2023-07-17 20:08:32 +08:00
flynn
d6709ded53
Merge branch 'master' into avro-fix 2023-07-17 14:51:34 +08:00
taiyang-li
8ea335aca7 update style 2023-07-17 10:43:13 +08:00
taiyang-li
7716479a37 add comments for https://github.com/ClickHouse/ClickHouse/pull/52112 2023-07-17 10:33:38 +08:00
flynn
386adfad33 Avro input format support Union with single type 2023-07-15 16:21:58 +00:00
taiyang-li
8ea3bf4ade improve ch to arrow 2023-07-14 16:09:22 +08:00
kevinyhzou
c6b8097090 rebase main 2023-07-14 11:24:38 +08:00
kevinyhzou
b2665031dc review fix 2023-07-13 20:27:14 +08:00
kevinyhzou
ba57c84db3 bug fix csv input field type mismatch 2023-07-13 20:24:10 +08:00
Dmitry Kardymon
385a210fee Merge remote-tracking branch 'origin/master' into ADQM-870 2023-07-10 13:19:21 +00:00
Alexey Milovidov
3d4800995f
Merge pull request #49732 from nickitat/impr_prefetch
Improve reading with prefetch
2023-07-09 06:10:58 +03:00
Kruglov Pavel
06de25451a
Merge branch 'master' into formats-with-subcolumns 2023-07-06 16:21:52 +02:00
avogar
810d1ee069 Fix tests 2023-07-06 13:48:57 +00:00
Nikita Taranov
aec7205636 rework pool usage 2023-07-06 14:41:09 +02:00
Dmitry Kardymon
86fc702236
Add skipWhitespacesAndTabs()
Co-authored-by: Kruglov Pavel <48961922+Avogar@users.noreply.github.com>
2023-07-06 15:14:18 +03:00
Dmitry Kardymon
32f5a78302 Fix setting name 2023-07-06 07:32:46 +00:00
Dmitry Kardymon
24b5c9c204 Use one setting input_format_csv_allow_variable_number_of_colums and code in RowInput 2023-07-06 06:05:43 +00:00
avogar
d11cd0dc30 Fix tests 2023-07-05 17:56:03 +00:00
Dmitry Kardymon
86014a60a3 Fixed case with spaces before delimiter 2023-07-05 11:42:02 +00:00
avogar
98aa6b317f Support reading subcolumns from file/s3/hdfs/url/azureBlobStorage table functions 2023-07-04 21:17:26 +00:00
Robert Schulze
fe49e98455
Follow-up to re2 update 2023-06-02 (#50949) 2023-07-03 08:28:25 +00:00
avogar
34bf0284ad Add RowBinaryWithDefaults format 2023-06-30 16:18:30 +00:00
avogar
03f820bc4a Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema 2023-06-22 18:46:01 +00:00
avogar
4060beae49 Structure to CapnProto/Protobuf schema take 1 2023-06-22 18:00:00 +00:00
Dmitry Kardymon
19d0214ac1 Merge remote-tracking branch 'origin/master' into ADQM-870 2023-06-22 13:02:31 +00:00
Dmitry Kardymon
a0fde6a55b Style fix 2023-06-22 10:50:14 +00:00
Dmitry Kardymon
2c3a4cb90d Style fix 2023-06-22 10:47:07 +00:00
Sema Checherinda
01de36f1fa
Merge pull request #50395 from CheSema/better-log
require `finalize()` call before d-tor for all writes buffers
2023-06-21 21:12:02 +02:00
Dmitry Kardymon
fff0c8da92 Merge remote-tracking branch 'origin/master' into ADQM-870 2023-06-21 10:56:50 +00:00
Kruglov Pavel
8f8cd97fd8
Merge pull request #51088 from Avogar/better-progress-bar
Improve progress bar for file/s3/hdfs/url table functions. Step 1
2023-06-21 12:42:25 +02:00
Sema Checherinda
9b0c3359cf
Merge branch 'master' into better-log 2023-06-20 20:37:36 +02:00
Sema Checherinda
fd292dc730 work with comment on the PR 2023-06-20 20:02:04 +02:00
Kruglov Pavel
0edfbb45ad
Merge pull request #50873 from Avogar/parquet-big-integers
Fallback to parsing big integer from String instead of exception in Parquet format
2023-06-20 16:10:46 +02:00
avogar
d492acbcd2 Fix tests 2023-06-19 13:36:29 +00:00
Dmitry Kardymon
f81401db99 Add empty line test 2023-06-19 10:48:38 +00:00
Dmitry Kardymon
dd43a186ad Minor edit docs / add int256 test 2023-06-19 09:51:29 +00:00
Dmitry Kardymon
30bea857fd Merge remote-tracking branch 'origin/master' into ADQM-870 2023-06-19 07:19:07 +00:00
avogar
3209ebe34b Improve progress bar for file/s3/hdfs/url table functions. Step 1 2023-06-16 15:51:18 +00:00
Sema Checherinda
1cb02e2710 do call finalize for all buffers 2023-06-16 16:38:18 +02:00
Dmitry Kardymon
0eeee11dc4 Style fix, add comment 2023-06-15 12:36:18 +00:00
Dmitry Kardymon
806176d88e Add input_format_csv_missing_as_default setting and tests 2023-06-15 11:23:08 +00:00
KevinyhZou
953f40aa3b
Merge branch 'master' into bug_fix_csv_parse_by_tab_delimiter 2023-06-15 10:25:19 +08:00
Dmitry Kardymon
a91fc3ddb3 Add docs/ add more cases in test 2023-06-14 16:44:31 +00:00
Dmitry Kardymon
ed318d1035 Add input_format_csv_ignore_extra_columns setting (prototype) 2023-06-14 10:35:36 +00:00
kevinyhzou
f3b99156ac review fix 2023-06-14 10:48:21 +08:00
Kruglov Pavel
607f337d67
Merge pull request #50592 from Avogar/max-bytes-to-read-in-schema-inference
Add setting to limit the number of bytes to read in schema inference
2023-06-13 16:47:57 +02:00
Kruglov Pavel
8fdcd91c38
Merge pull request #49752 from Avogar/better-capnproto-3
Refactor CapnProto format to improve input/output performance
2023-06-13 16:20:38 +02:00
Kruglov Pavel
edd47a2281
Merge branch 'master' into skip-trailing-empty-lines 2023-06-12 13:57:15 +02:00
Kruglov Pavel
e03cd725b0
Merge pull request #50602 from Avogar/null-as-default-schema-inference
Respect setting input_format_as_default in schema inference
2023-06-12 13:45:52 +02:00
Kruglov Pavel
da68980b8d
Merge branch 'master' into max-bytes-to-read-in-schema-inference 2023-06-12 13:45:31 +02:00
avogar
5cec4c3161 Fallback to parsing big integer from String instead of exception in Parquet format 2023-06-12 11:34:40 +00:00
kevinyhzou
911f8ad8dc use whitespace or tab as field delimiter 2023-06-12 11:57:52 +08:00
Hongbin Ma
41c34aaf5e optimize parquet write performance for parallel threads
fix CI

fix review comments and CI
2023-06-09 19:09:58 -07:00
kevinyhzou
48e1b21aab Add feature to support read csv by space & tab delimiter 2023-06-08 20:34:30 +08:00
avogar
cc036528fe Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-capnproto-3 2023-06-08 11:16:13 +00:00
Kruglov Pavel
a714c1662e
Merge branch 'master' into max-bytes-to-read-in-schema-inference 2023-06-08 12:55:31 +02:00
Kruglov Pavel
4727c85e1f
Merge branch 'master' into null-as-default-schema-inference 2023-06-08 12:54:18 +02:00