Michael Kolupaev
|
a1522e22ea
|
Merge pull request #53281 from Avogar/batch-small-parquet-row-groups
Optimize reading small row groups by batching them together in Parquet
|
2023-08-18 17:15:42 -07:00 |
|
kothiga
|
f33c585bc5
|
Addressing feedback.
|
2023-08-18 13:50:31 -07:00 |
|
Austin Kothig
|
6b42975d33
|
Change BE-UUID to work the same as LE-UUID. Included high and low getters to provide cleaner code when accessing undertype.
|
2023-08-18 08:19:46 -07:00 |
|
Kruglov Pavel
|
3bae63e48e
|
Fix special build
|
2023-08-18 13:56:32 +02:00 |
|
taiyang-li
|
4f9429d2e4
|
fix ut tests/queries/0_stateless/00900_null_array_orc_load.sh
|
2023-08-18 17:58:24 +08:00 |
|
taiyang-li
|
15720d9cef
|
fix ut tests/queries/0_stateless/02518_parquet_arrow_orc_boolean_value.sh
|
2023-08-18 15:10:25 +08:00 |
|
taiyang-li
|
8e0d5b7ee0
|
fix bugs
|
2023-08-18 13:31:26 +08:00 |
|
avogar
|
bca91548ad
|
Add setting input_format_parquet_local_file_min_bytes_for_seek
|
2023-08-17 12:28:01 +00:00 |
|
taiyang-li
|
c4777397f7
|
fix integration test test_kafka_formats
|
2023-08-17 14:42:51 +08:00 |
|
taiyang-li
|
c2017e0ea3
|
update orc version
|
2023-08-15 14:49:45 +08:00 |
|
taiyang-li
|
bfa9d361cc
|
fix tests
|
2023-08-15 12:16:22 +08:00 |
|
ltrk2
|
2846ea49b4
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-08-14 13:02:30 -04:00 |
|
taiyang-li
|
bbe5caa9dd
|
fix building
|
2023-08-14 19:55:45 +08:00 |
|
李扬
|
613bef89f4
|
Merge branch 'master' into ch_gluten_2583
|
2023-08-14 17:44:15 +08:00 |
|
taiyang-li
|
34c3162c7c
|
revert old orc files
|
2023-08-14 17:42:42 +08:00 |
|
taiyang-li
|
8ee1912de6
|
remove old orc files
|
2023-08-14 17:26:24 +08:00 |
|
taiyang-li
|
57bef64fbc
|
add new native orc files
|
2023-08-14 17:25:58 +08:00 |
|
avogar
|
7e863a2726
|
Address comments
|
2023-08-11 13:17:49 +00:00 |
|
avogar
|
3ad7e57059
|
Optimize reading small row groups by batching them together in Parquet
|
2023-08-11 13:17:45 +00:00 |
|
Kruglov Pavel
|
b6b0e9c6bc
|
Merge branch 'master' into variable-number-of-volumns-more-formats
|
2023-08-11 14:02:21 +02:00 |
|
Kruglov Pavel
|
00865a7dad
|
Merge branch 'master' into format-one
|
2023-08-11 13:54:58 +02:00 |
|
ltrk2
|
1dc1b54c68
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-08-11 07:48:49 -04:00 |
|
Alexey Milovidov
|
fd7b92e90a
|
Merge pull request #53135 from ClickHouse/file_diagnostics_while_reading_header
Add diagnostic info about file name during schema inference
|
2023-08-10 21:56:12 +03:00 |
|
ltrk2
|
a2054c04dd
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-08-10 10:21:34 -04:00 |
|
avogar
|
078683e226
|
Fix tests
|
2023-08-10 13:07:06 +00:00 |
|
Kruglov Pavel
|
6600f87f86
|
Merge branch 'master' into http-valid-json-on-exception
|
2023-08-10 13:53:32 +02:00 |
|
avogar
|
82aff97dd0
|
Add comment, more test
|
2023-08-10 11:51:36 +00:00 |
|
Kruglov Pavel
|
bb38918a26
|
Apply suggestions from code review
Co-authored-by: János Benjamin Antal <antaljanosbenjamin@users.noreply.github.com>
|
2023-08-10 13:21:11 +02:00 |
|
Kruglov Pavel
|
33a39900ad
|
Merge branch 'master' into variable-number-of-volumns-more-formats
|
2023-08-09 19:51:17 +02:00 |
|
ltrk2
|
139e9433a8
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-08-09 09:48:00 -04:00 |
|
Kruglov Pavel
|
70659e9721
|
Fix style
|
2023-08-09 15:07:49 +02:00 |
|
avogar
|
01a7c7560f
|
Add input format One
|
2023-08-09 11:25:32 +00:00 |
|
Alexey Milovidov
|
5561e3e198
|
Remove garbage and speed up Debug and Tidy builds
|
2023-08-09 01:44:39 +02:00 |
|
Alexey Milovidov
|
5dd99db369
|
Add diagnostic info about file name during schema inference
|
2023-08-08 03:55:06 +02:00 |
|
alexX512
|
520a3c6eeb
|
Merge branch 'master' of github.com:ClickHouse/ClickHouse
|
2023-08-07 16:52:51 +00:00 |
|
Michael Kolupaev
|
4ed86fea2f
|
Fix Parquet stats for Float32 and Float64
|
2023-08-04 21:01:07 +00:00 |
|
alexX512
|
0d84226914
|
Merge branch 'master' of github.com:ClickHouse/ClickHouse
|
2023-08-02 19:18:59 +00:00 |
|
alexX512
|
06e187e58a
|
Split mergeSortingTransform and mergeSortingPartialResultTransform into 2 files
|
2023-08-02 18:15:25 +00:00 |
|
ltrk2
|
27a2d4d1c7
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-08-02 11:36:43 -04:00 |
|
Anton Popov
|
ff137773e7
|
Merge branch 'master' into formats-with-subcolumns
|
2023-08-02 15:24:56 +02:00 |
|
Kruglov Pavel
|
81866bcc9c
|
Fix special build
|
2023-08-02 12:35:58 +02:00 |
|
avogar
|
d12e96177a
|
Fix tests
|
2023-08-01 16:17:03 +00:00 |
|
Kruglov Pavel
|
23aab71d7c
|
Merge branch 'master' into http-valid-json-on-exception
|
2023-08-01 16:47:31 +02:00 |
|
Kruglov Pavel
|
8f6526a930
|
Merge branch 'master' into structure-to-schema
|
2023-08-01 16:22:14 +02:00 |
|
alexX512
|
8e3296a44a
|
Add support of local progress in scheduler
|
2023-08-01 13:31:18 +00:00 |
|
avogar
|
fa905ebd27
|
Clean up
|
2023-08-01 10:14:09 +00:00 |
|
avogar
|
a71cd56a90
|
Output valid JSON/XML on excetpion during HTTP query execution
|
2023-08-01 10:06:56 +00:00 |
|
ltrk2
|
e869adf645
|
Improve function naming
|
2023-07-31 06:48:50 -07:00 |
|
Alexey Milovidov
|
caa4590361
|
Merge branch 'master' into check-for-hiding-cyrillic-characters
|
2023-07-30 02:16:56 +02:00 |
|
ltrk2
|
6c9a1b14ef
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-07-28 16:18:46 -04:00 |
|
avogar
|
c3c64a7dd5
|
Fix
|
2023-07-28 11:40:05 +00:00 |
|
Kruglov Pavel
|
3e1c409e60
|
Merge branch 'master' into structure-to-schema
|
2023-07-28 11:32:16 +02:00 |
|
avogar
|
6d77d52dfe
|
Allow variable number of columns in TSV/CuatomSeprarated/JSONCompactEachRow, make schema inference work with variable number of columns
|
2023-07-27 18:02:29 +00:00 |
|
Antonio Andelic
|
f61f36800c
|
Fix style
|
2023-07-27 08:48:23 +00:00 |
|
Alexey Milovidov
|
65ffe91bf2
|
Fix double whitespace
|
2023-07-27 07:13:26 +02:00 |
|
Alexey Milovidov
|
6aab4cc835
|
Check for unexpected cyrillic
|
2023-07-27 05:25:40 +02:00 |
|
Kruglov Pavel
|
fab77783f1
|
Merge pull request #49367 from ClickHouse/enc
Partially reimplement Parquet encoder to make it faster and parallelizable
|
2023-07-27 00:48:54 +02:00 |
|
Kruglov Pavel
|
0d34e97dbe
|
Merge branch 'master' into formats-with-subcolumns
|
2023-07-26 13:30:35 +02:00 |
|
Kruglov Pavel
|
15cc046883
|
Merge branch 'master' into better-progress-bar-2
|
2023-07-26 13:12:24 +02:00 |
|
Michael Kolupaev
|
5ee71bd643
|
Work around the clang bug
|
2023-07-25 10:26:26 +00:00 |
|
Michael Kolupaev
|
dfdf5de972
|
Fixes
|
2023-07-25 10:16:28 +00:00 |
|
Michael Kolupaev
|
db5cb96050
|
Start over when falling back to non-dictionary encoding
|
2023-07-25 10:16:28 +00:00 |
|
Michael Kolupaev
|
8184a289e5
|
Partially reimplement Parquet encoder to make it faster and parallelizable
|
2023-07-25 10:16:28 +00:00 |
|
alexX512
|
1fe2f052e3
|
Merge branch 'master' of github.com:ClickHouse/ClickHouse
|
2023-07-25 09:36:44 +00:00 |
|
Alexey Milovidov
|
21382afa2b
|
Check for punctuation
|
2023-07-25 06:10:04 +02:00 |
|
Alexey Milovidov
|
ecdafeaf83
|
Merge pull request #52175 from bigo-sg/comment_improve_ch_to_arrow
Add comments for https://github.com/ClickHouse/ClickHouse/pull/52112
|
2023-07-25 06:47:48 +03:00 |
|
Kruglov Pavel
|
fec5675cd4
|
Merge branch 'master' into better-progress-bar-2
|
2023-07-24 19:59:38 +02:00 |
|
李扬
|
e8fe485330
|
Merge branch 'master' into comment_improve_ch_to_arrow
|
2023-07-22 11:43:32 +08:00 |
|
Alexey Milovidov
|
9ae685975e
|
Merge branch 'master' into avro-fix
|
2023-07-22 04:56:48 +03:00 |
|
alexX512
|
c403f56e09
|
Merge branch 'master' of github.com:ClickHouse/ClickHouse
|
2023-07-21 17:56:53 +00:00 |
|
Alexander Tokmakov
|
b45c2c939b
|
disable expression templates for time intervals (#52335)
|
2023-07-21 15:17:07 +03:00 |
|
ltrk2
|
90a2c460c6
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-07-21 08:07:18 -04:00 |
|
Kruglov Pavel
|
342400d0b3
|
Merge branch 'master' into revert-52322-revert-51716-bug_fix_csv_field_type_not_match
|
2023-07-20 12:39:38 +02:00 |
|
Nikolay Degterinsky
|
209429d0e3
|
Merge pull request #49664 from ilejn/test_for_basic_auth_registry
Basic auth to fetch Avro schema in Kafka
|
2023-07-20 10:58:11 +02:00 |
|
李扬
|
68bf4c3590
|
Merge branch 'master' into comment_improve_ch_to_arrow
|
2023-07-20 10:10:47 +08:00 |
|
ltrk2
|
a753c3c6ad
|
Merge branch 'master' into feature/mergetree-checksum-big-endian-support
|
2023-07-19 16:22:58 -04:00 |
|
Kruglov Pavel
|
0fca64ced4
|
Merge pull request #51695 from Avogar/row-binary-with-defaults
Add RowBinaryWithDefaults format
|
2023-07-19 22:10:30 +02:00 |
|
ltrk2
|
ba4072f049
|
Adapt changes around SipHash
|
2023-07-19 10:01:58 -07:00 |
|
ltrk2
|
51e2c58a53
|
Implement endianness-independent SipHash and MergeTree checksum serialization
|
2023-07-19 10:01:55 -07:00 |
|
Kruglov Pavel
|
f0026af189
|
Revert "Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed""
|
2023-07-19 14:51:11 +02:00 |
|
Kruglov Pavel
|
7b3564f96a
|
Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed"
|
2023-07-19 14:44:59 +02:00 |
|
robot-ch-test-poll4
|
63d0616a22
|
Merge pull request #51716 from KevinyhZou/bug_fix_csv_field_type_not_match
Improve CSVInputFormat to check and set default value to column if deserialize failed
|
2023-07-19 14:41:05 +02:00 |
|
kevinyhzou
|
dcf7ba2534
|
remove unuseful code
|
2023-07-19 19:36:19 +08:00 |
|
kevinyhzou
|
95424177d5
|
review fix
|
2023-07-19 18:26:54 +08:00 |
|
Ilya Golshtein
|
c1c5ffa309
|
test_for_basic_auth_registry - cpp code small improvement
|
2023-07-19 08:32:45 +00:00 |
|
dheerajathrey
|
8e1de7897a
|
indentation fix
|
2023-07-19 08:32:44 +00:00 |
|
dheerajathrey
|
1564eace38
|
enable url-encoded basic auth to fetch avro schema in kafka
|
2023-07-19 08:32:44 +00:00 |
|
Alexey Milovidov
|
0789f388c3
|
Update ArrowFieldIndexUtil.h
|
2023-07-19 02:45:56 +03:00 |
|
Alexey Milovidov
|
6d915042a2
|
Fix ugly code
|
2023-07-19 01:44:20 +02:00 |
|
avogar
|
67f340b501
|
Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema
|
2023-07-18 13:52:15 +00:00 |
|
Kruglov Pavel
|
64e88cde21
|
Merge branch 'master' into better-progress-bar-2
|
2023-07-18 13:37:53 +02:00 |
|
Kruglov Pavel
|
1e616e17ab
|
Merge branch 'master' into row-binary-with-defaults
|
2023-07-17 19:13:57 +02:00 |
|
Kruglov Pavel
|
1dd05319b5
|
Merge branch 'master' into formats-with-subcolumns
|
2023-07-17 19:13:42 +02:00 |
|
kevinyhzou
|
355faa4251
|
ci fix
|
2023-07-17 20:08:32 +08:00 |
|
flynn
|
d6709ded53
|
Merge branch 'master' into avro-fix
|
2023-07-17 14:51:34 +08:00 |
|
taiyang-li
|
8ea335aca7
|
update style
|
2023-07-17 10:43:13 +08:00 |
|
taiyang-li
|
7716479a37
|
add comments for https://github.com/ClickHouse/ClickHouse/pull/52112
|
2023-07-17 10:33:38 +08:00 |
|
flynn
|
386adfad33
|
Avro input format support Union with single type
|
2023-07-15 16:21:58 +00:00 |
|
taiyang-li
|
8ea3bf4ade
|
improve ch to arrow
|
2023-07-14 16:09:22 +08:00 |
|
kevinyhzou
|
c6b8097090
|
rebase main
|
2023-07-14 11:24:38 +08:00 |
|
kevinyhzou
|
b2665031dc
|
review fix
|
2023-07-13 20:27:14 +08:00 |
|
kevinyhzou
|
ba57c84db3
|
bug fix csv input field type mismatch
|
2023-07-13 20:24:10 +08:00 |
|
Dmitry Kardymon
|
385a210fee
|
Merge remote-tracking branch 'origin/master' into ADQM-870
|
2023-07-10 13:19:21 +00:00 |
|
Alexey Milovidov
|
3d4800995f
|
Merge pull request #49732 from nickitat/impr_prefetch
Improve reading with prefetch
|
2023-07-09 06:10:58 +03:00 |
|
Kruglov Pavel
|
06de25451a
|
Merge branch 'master' into formats-with-subcolumns
|
2023-07-06 16:21:52 +02:00 |
|
avogar
|
810d1ee069
|
Fix tests
|
2023-07-06 13:48:57 +00:00 |
|
Nikita Taranov
|
aec7205636
|
rework pool usage
|
2023-07-06 14:41:09 +02:00 |
|
Dmitry Kardymon
|
86fc702236
|
Add skipWhitespacesAndTabs()
Co-authored-by: Kruglov Pavel <48961922+Avogar@users.noreply.github.com>
|
2023-07-06 15:14:18 +03:00 |
|
Dmitry Kardymon
|
32f5a78302
|
Fix setting name
|
2023-07-06 07:32:46 +00:00 |
|
Dmitry Kardymon
|
24b5c9c204
|
Use one setting input_format_csv_allow_variable_number_of_colums and code in RowInput
|
2023-07-06 06:05:43 +00:00 |
|
avogar
|
d11cd0dc30
|
Fix tests
|
2023-07-05 17:56:03 +00:00 |
|
Dmitry Kardymon
|
86014a60a3
|
Fixed case with spaces before delimiter
|
2023-07-05 11:42:02 +00:00 |
|
avogar
|
98aa6b317f
|
Support reading subcolumns from file/s3/hdfs/url/azureBlobStorage table functions
|
2023-07-04 21:17:26 +00:00 |
|
Robert Schulze
|
fe49e98455
|
Follow-up to re2 update 2023-06-02 (#50949)
|
2023-07-03 08:28:25 +00:00 |
|
avogar
|
34bf0284ad
|
Add RowBinaryWithDefaults format
|
2023-06-30 16:18:30 +00:00 |
|
avogar
|
03f820bc4a
|
Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema
|
2023-06-22 18:46:01 +00:00 |
|
avogar
|
4060beae49
|
Structure to CapnProto/Protobuf schema take 1
|
2023-06-22 18:00:00 +00:00 |
|
Dmitry Kardymon
|
19d0214ac1
|
Merge remote-tracking branch 'origin/master' into ADQM-870
|
2023-06-22 13:02:31 +00:00 |
|
Dmitry Kardymon
|
a0fde6a55b
|
Style fix
|
2023-06-22 10:50:14 +00:00 |
|
Dmitry Kardymon
|
2c3a4cb90d
|
Style fix
|
2023-06-22 10:47:07 +00:00 |
|
Sema Checherinda
|
01de36f1fa
|
Merge pull request #50395 from CheSema/better-log
require `finalize()` call before d-tor for all writes buffers
|
2023-06-21 21:12:02 +02:00 |
|
Dmitry Kardymon
|
fff0c8da92
|
Merge remote-tracking branch 'origin/master' into ADQM-870
|
2023-06-21 10:56:50 +00:00 |
|
Kruglov Pavel
|
8f8cd97fd8
|
Merge pull request #51088 from Avogar/better-progress-bar
Improve progress bar for file/s3/hdfs/url table functions. Step 1
|
2023-06-21 12:42:25 +02:00 |
|
Sema Checherinda
|
9b0c3359cf
|
Merge branch 'master' into better-log
|
2023-06-20 20:37:36 +02:00 |
|
Sema Checherinda
|
fd292dc730
|
work with comment on the PR
|
2023-06-20 20:02:04 +02:00 |
|
Kruglov Pavel
|
0edfbb45ad
|
Merge pull request #50873 from Avogar/parquet-big-integers
Fallback to parsing big integer from String instead of exception in Parquet format
|
2023-06-20 16:10:46 +02:00 |
|
avogar
|
d492acbcd2
|
Fix tests
|
2023-06-19 13:36:29 +00:00 |
|
Dmitry Kardymon
|
f81401db99
|
Add empty line test
|
2023-06-19 10:48:38 +00:00 |
|
Dmitry Kardymon
|
dd43a186ad
|
Minor edit docs / add int256 test
|
2023-06-19 09:51:29 +00:00 |
|
Dmitry Kardymon
|
30bea857fd
|
Merge remote-tracking branch 'origin/master' into ADQM-870
|
2023-06-19 07:19:07 +00:00 |
|
avogar
|
3209ebe34b
|
Improve progress bar for file/s3/hdfs/url table functions. Step 1
|
2023-06-16 15:51:18 +00:00 |
|
Sema Checherinda
|
1cb02e2710
|
do call finalize for all buffers
|
2023-06-16 16:38:18 +02:00 |
|
Dmitry Kardymon
|
0eeee11dc4
|
Style fix, add comment
|
2023-06-15 12:36:18 +00:00 |
|
Dmitry Kardymon
|
806176d88e
|
Add input_format_csv_missing_as_default setting and tests
|
2023-06-15 11:23:08 +00:00 |
|
KevinyhZou
|
953f40aa3b
|
Merge branch 'master' into bug_fix_csv_parse_by_tab_delimiter
|
2023-06-15 10:25:19 +08:00 |
|
Dmitry Kardymon
|
a91fc3ddb3
|
Add docs/ add more cases in test
|
2023-06-14 16:44:31 +00:00 |
|
Dmitry Kardymon
|
ed318d1035
|
Add input_format_csv_ignore_extra_columns setting (prototype)
|
2023-06-14 10:35:36 +00:00 |
|
kevinyhzou
|
f3b99156ac
|
review fix
|
2023-06-14 10:48:21 +08:00 |
|
Kruglov Pavel
|
607f337d67
|
Merge pull request #50592 from Avogar/max-bytes-to-read-in-schema-inference
Add setting to limit the number of bytes to read in schema inference
|
2023-06-13 16:47:57 +02:00 |
|
Kruglov Pavel
|
8fdcd91c38
|
Merge pull request #49752 from Avogar/better-capnproto-3
Refactor CapnProto format to improve input/output performance
|
2023-06-13 16:20:38 +02:00 |
|
Kruglov Pavel
|
edd47a2281
|
Merge branch 'master' into skip-trailing-empty-lines
|
2023-06-12 13:57:15 +02:00 |
|
Kruglov Pavel
|
e03cd725b0
|
Merge pull request #50602 from Avogar/null-as-default-schema-inference
Respect setting input_format_as_default in schema inference
|
2023-06-12 13:45:52 +02:00 |
|
Kruglov Pavel
|
da68980b8d
|
Merge branch 'master' into max-bytes-to-read-in-schema-inference
|
2023-06-12 13:45:31 +02:00 |
|
avogar
|
5cec4c3161
|
Fallback to parsing big integer from String instead of exception in Parquet format
|
2023-06-12 11:34:40 +00:00 |
|
kevinyhzou
|
911f8ad8dc
|
use whitespace or tab as field delimiter
|
2023-06-12 11:57:52 +08:00 |
|
Hongbin Ma
|
41c34aaf5e
|
optimize parquet write performance for parallel threads
fix CI
fix review comments and CI
|
2023-06-09 19:09:58 -07:00 |
|
kevinyhzou
|
48e1b21aab
|
Add feature to support read csv by space & tab delimiter
|
2023-06-08 20:34:30 +08:00 |
|
avogar
|
cc036528fe
|
Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-capnproto-3
|
2023-06-08 11:16:13 +00:00 |
|
Kruglov Pavel
|
a714c1662e
|
Merge branch 'master' into max-bytes-to-read-in-schema-inference
|
2023-06-08 12:55:31 +02:00 |
|
Kruglov Pavel
|
4727c85e1f
|
Merge branch 'master' into null-as-default-schema-inference
|
2023-06-08 12:54:18 +02:00 |
|