Commit Graph

1447 Commits

Author SHA1 Message Date
avogar
1c0941d72a Add docs and examples 2023-01-16 16:46:41 +00:00
flynn
29eb30b49f Fix some reading avro format bugs
fix
2023-01-14 18:05:26 +00:00
avogar
e2470dd670 Fix tests 2023-01-13 17:03:53 +00:00
avogar
b461935374 Better 2023-01-12 13:11:04 +00:00
Kruglov Pavel
05a11ff4a4
Merge branch 'master' into tsv-csv-detect-header 2023-01-12 12:35:18 +01:00
avogar
e4d774d906 Better naming 2023-01-11 22:57:14 +00:00
avogar
26cd56d113 Fix tests, make better 2023-01-11 22:52:15 +00:00
avogar
3b45863d15 Make better implementation, fix tests 2023-01-11 17:12:56 +00:00
avogar
6312b75f44 Fix style 2023-01-10 16:28:52 +00:00
avogar
615fe4cecb Fix tests 2023-01-10 16:27:23 +00:00
Alexey Milovidov
1229a20fb3
Merge pull request #45047 from ClickHouse/fix-buffer-overflow
Fix buffer overflow in parser
2023-01-10 05:06:03 +03:00
Yakov Olkhovskiy
4f32f3b8cb
Merge pull request #44484 from bigo-sg/arrow_struct_field
Optimization for reading struct fields in parquet/orc files
2023-01-09 15:36:26 -05:00
Kseniia Sumarokova
119501f1d9
Merge pull request #44698 from Avogar/parquet-bool
Support Bool type in Arrow/Parquet/ORC
2023-01-09 12:56:28 +01:00
lgbo-ustc
b639bcabc0 some fixes 2023-01-09 18:13:00 +08:00
lgbo-ustc
f127b3a60a update ArrowFieldIndexUtil 2023-01-09 18:13:00 +08:00
lgbo-ustc
a3bdfddc9d support nested table 2023-01-09 18:13:00 +08:00
lgbo-ustc
4f3f781b85 fixed test case 2023-01-09 18:13:00 +08:00
lgbo-ustc
755f03db4e fixed 2023-01-09 18:13:00 +08:00
lgbo-ustc
f6850d96cb fixed missing columns 2023-01-09 18:13:00 +08:00
lgbo-ustc
4cf6beee27 fixed 2023-01-09 18:13:00 +08:00
lgbo-ustc
81e2832133 fixed 2023-01-09 18:13:00 +08:00
lgbo-ustc
77cea49cec fixed including header failure 2023-01-09 18:13:00 +08:00
lgbo-ustc
8f8f6f966b Optimization for reading struct fields in parquet/orc files 2023-01-09 18:13:00 +08:00
Alexey Milovidov
0d39d26a34 Don't fix parallel formatting 2023-01-09 06:15:20 +01:00
Alexey Milovidov
d331f0ce82 Fix buffer overflow in parser 2023-01-09 03:31:12 +01:00
avogar
ee72799121 Fix tests, make better 2023-01-06 20:46:43 +00:00
avogar
7fcdb08ec6 Detect header in CSV/TSV/CustomSeparated files automatically 2023-01-05 22:57:25 +00:00
Yakov Olkhovskiy
7a5a36cbed
Merge branch 'master' into refactoring-ip-types 2023-01-04 11:11:06 -05:00
Kruglov Pavel
b9bdf62bf3
Merge branch 'master' into parquet-bool 2023-01-04 14:49:41 +01:00
Kruglov Pavel
59263f3ae1
Merge pull request #44501 from Avogar/validate-types
Validate data types according to settings.
2023-01-04 14:48:09 +01:00
Kruglov Pavel
90ae405033
Merge pull request #44876 from Avogar/fix-perf-tests
Revert some changes from #42777 to fix performance tests
2023-01-04 14:27:17 +01:00
Kruglov Pavel
0c7d39ac7f
Merge pull request #44832 from ucasfl/row-number
Fix output_format_pretty_row_numbers does not preserve the counter across the blocks
2023-01-04 14:15:47 +01:00
Kruglov Pavel
4e261ab230
Fix JSONCompactEachRow 2023-01-03 21:16:38 +01:00
Kruglov Pavel
314d95fd71
Fix special build 2023-01-03 20:34:30 +01:00
avogar
28eb2dbd4c Revert some changes from #42777 2023-01-03 18:53:03 +00:00
Alexey Milovidov
e855d3519a
Merge branch 'master' into refactoring-ip-types 2023-01-02 21:58:53 +03:00
Kruglov Pavel
4a7c399076
Merge branch 'master' into parquet-bool 2023-01-02 16:33:42 +01:00
Kruglov Pavel
0a43976977
Merge branch 'master' into validate-types 2023-01-02 16:10:14 +01:00
Kruglov Pavel
1c2dc05d6e
Merge pull request #44446 from Avogar/arrow-nullables
Respect setting settings.schema_inference_make_columns_nullable in Parquet/ORC/Arrow formats
2023-01-02 16:05:57 +01:00
Kruglov Pavel
966f57ef68
Merge pull request #42777 from Avogar/improve-streaming-engines
Refactor and Improve streaming engines Kafka/RabbitMQ/NATS and data formats
2023-01-02 15:59:06 +01:00
flynn
7780fae9db fix 2023-01-02 12:38:51 +00:00
flynn
7b487dd923 fix 2023-01-02 12:37:43 +00:00
flynn
3a1dd045dd Fix output_format_pretty_row_numbers does not preserve the counter across the blocks 2023-01-02 09:27:37 +00:00
Kruglov Pavel
8479615c48
Merge pull request #44684 from Avogar/avro-bool
Input/ouptut avro bool type as ClickHouse bool type
2022-12-30 17:56:36 +01:00
Kruglov Pavel
4982d132fb
Merge branch 'master' into validate-types 2022-12-30 17:52:13 +01:00
Nikolay Degterinsky
dfe93b5d82
Merge pull request #42284 from Algunenano/perf_experiment
Performance experiment
2022-12-30 03:14:22 +01:00
Kruglov Pavel
894726bd8f
Merge branch 'master' into improve-streaming-engines 2022-12-29 22:59:45 +01:00
avogar
a0db1dd1ea Support Bool type in Arrow/Parquet/ORC 2022-12-28 22:58:28 +00:00
Raúl Marín
5de11979ce
Unify query elapsed time measurements (#43455)
* Unify query elapsed time reporting

* add-test: Make shell tests executable

* Add some tests around query elapsed time

* Style and ubsan
2022-12-28 21:01:41 +01:00
Raúl Marín
e915ce1e95 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-12-28 20:15:43 +01:00
Raúl Marín
f6428964cc Better and common error handling 2022-12-28 20:15:27 +01:00
avogar
f1191bbbc6 Input/ouptut avro bool type as ClickHouse bool type 2022-12-28 17:38:58 +00:00
avogar
411f98306a Merge branch 'master' of github.com:ClickHouse/ClickHouse into validate-types 2022-12-27 19:24:15 +00:00
Kruglov Pavel
6dea7336f7
Merge pull request #44405 from Avogar/fix-parquet-orc
Fix reading columns that are not presented in input data in Parquet/ORC formats
2022-12-27 16:58:35 +01:00
Raúl Marín
fc1fa82a39
Merge branch 'master' into perf_experiment 2022-12-27 10:51:58 +01:00
Kruglov Pavel
6a017a6586
Merge pull request #43379 from Avogar/better-capn-proto
Add small improvements in CapnProto format
2022-12-22 14:50:10 +01:00
Yakov Olkhovskiy
a8cb29da4b
Merge branch 'master' into refactoring-ip-types 2022-12-21 23:56:24 -05:00
avogar
4ab3e90382 Validate types in table function arguments/CAST function arguments/JSONAsObject schema inference 2022-12-21 21:21:30 +00:00
Kruglov Pavel
5e01a3d74e
Merge branch 'master' into improve-streaming-engines 2022-12-21 10:51:50 +01:00
Kruglov Pavel
09ab5832b1
Merge pull request #44382 from Avogar/fix-bson-object-id
Fix reading ObjectId in BSON schema inference
2022-12-21 10:48:50 +01:00
avogar
c49638e3a9 Respect setting settings.schema_inference_make_columns_nullable in Parquet/ORC/Arrow formats 2022-12-20 17:46:42 +00:00
Kruglov Pavel
643a35bed1
Merge pull request #44019 from Avogar/refactor-schema-inference
Refactor and improve schema inference for text formats
2022-12-20 17:29:03 +01:00
Kruglov Pavel
c0b17ca0af
Merge branch 'master' into fix-bson-object-id 2022-12-20 17:18:10 +01:00
Kruglov Pavel
fe28faa32d
Fix style 2022-12-20 14:49:39 +01:00
Kruglov Pavel
3f1e40aacd
Merge branch 'master' into fix-orc 2022-12-20 13:32:46 +01:00
Raúl Marín
45d27f461b
Merge branch 'master' into perf_experiment 2022-12-20 09:07:48 +00:00
avogar
e262e375dc Fix reading columns that are not presented in input data in Parquet/ORC formats 2022-12-19 20:30:54 +00:00
avogar
0c406adce2 Fix reading Map type in ORC format 2022-12-19 18:23:07 +00:00
avogar
21cdf6e6ae Fix reading ObjectId in BSON schema inference 2022-12-19 14:13:42 +00:00
avogar
291e51c533 Merge branch 'better-capn-proto' of github.com:Avogar/ClickHouse into better-capn-proto 2022-12-16 14:43:06 +00:00
avogar
4a51bdce86 Fix comments 2022-12-16 13:58:54 +00:00
Kruglov Pavel
3fad5c7f1f
Merge branch 'master' into refactor-schema-inference 2022-12-16 14:24:51 +01:00
avogar
cfcb444699 Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-capn-proto 2022-12-15 20:04:43 +00:00
avogar
755b08a49e Fix comments 2022-12-15 19:47:10 +00:00
Kruglov Pavel
c5b2e4cc23
Merge branch 'master' into improve-streaming-engines 2022-12-15 18:44:35 +01:00
avogar
a94a0d9c85 Fix tests, fix bugs 2022-12-14 21:17:00 +00:00
Nikolay Degterinsky
9b6d31b95d
Merge branch 'master' into perf_experiment 2022-12-13 17:15:07 +01:00
avogar
739ad23b1f Make better, fix bugs, improve error messages 2022-12-12 22:00:45 +00:00
avogar
c224e397ac Check if delimiters are empty, add comments 2022-12-08 20:00:10 +00:00
avogar
1ec5f8451b Merge branch 'master' of github.com:ClickHouse/ClickHouse into csv-custom-delimiter 2022-12-08 19:17:42 +00:00
Kruglov Pavel
de5ffc96e9
Fix style 2022-12-08 19:02:36 +01:00
avogar
556746692b Fix build 2022-12-08 17:20:43 +00:00
Yakov Olkhovskiy
0641066183
Merge branch 'master' into refactoring-ip-types 2022-12-08 11:12:05 -05:00
avogar
7375a7d429 Refactor and improve schema inference for text formats 2022-12-07 21:19:27 +00:00
Kruglov Pavel
c35b2a6495
Add a limit for string size in RowBinary format (#43842) 2022-12-02 13:57:11 +01:00
Alexander Tokmakov
431f6551cb
Merge branch 'master' into fix_assertion_in_thread_status 2022-11-30 23:05:15 +03:00
Anton Popov
fe5fff0347
Merge pull request #43329 from xiedeyantu/support_nested_column
s3 table function can support select nested column using {column_name}.{subcolumn_name}
2022-11-29 22:27:19 +01:00
Alexander Tokmakov
e45105bf44 detach threads from thread group 2022-11-28 21:31:55 +01:00
Yakov Olkhovskiy
770b520ded
Merge branch 'master' into refactoring-ip-types 2022-11-28 08:50:19 -05:00
Kruglov Pavel
dd7ac8bb96
Update src/Processors/Formats/Impl/CapnProtoRowOutputFormat.cpp
Co-authored-by: Nikolay Degterinsky <43110995+evillique@users.noreply.github.com>
2022-11-28 14:17:52 +01:00
Kruglov Pavel
2818ecf7f0
Merge pull request #43297 from arthurpassos/fix_arrow_list_column_parsing
Flatten list type arrow chunks on parsing
2022-11-25 18:13:27 +01:00
xiedeyantu
304b6ebf3a s3 table function can support select nested column using {column_name}.{subcolumn_name} 2022-11-23 23:36:12 +08:00
Raúl Marín
4aa29b6a63 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-11-22 19:09:00 +01:00
Raúl Marín
e63ba06048 Better cache management 2022-11-22 19:03:17 +01:00
avogar
ecdeff622b Add small improvements in CapnProto format 2022-11-18 20:13:00 +00:00
Yakov Olkhovskiy
dbaeabcf38 fixed some bugs, some functions corrected, some tests corrected 2022-11-18 20:10:27 +00:00
Arthur Passos
414fd07bba add docs 2022-11-17 17:28:51 -03:00
Arthur Passos
dd37ca7767 add docs 2022-11-17 17:25:27 -03:00
Arthur Passos
12d3f799a5 small change 2022-11-17 17:18:54 -03:00
Arthur Passos
fcc032a31e handle both zero based and non zero based arrow offsets 2022-11-17 17:15:24 -03:00
avogar
fcfdd73d17 Improve reading CSV field in CustomSeparated/Template format 2022-11-17 15:36:56 +00:00
Raúl Marín
80403015e7 Fix assert reached with lines without data 2022-11-17 16:19:53 +01:00
Arthur Passos
ed080b8ba5 fix style 2022-11-16 13:22:23 -03:00
Arthur Passos
e1236340b5 Flatten list type arrow chunks on parsing 2022-11-16 12:27:01 -03:00
avogar
2af60f34eb Restrict document size in parallel parsing, allow to read ObjectId/JS code into String column 2022-11-15 13:35:17 +00:00
avogar
842d25c358 Minor improvements, better docs 2022-11-14 20:05:01 +00:00
avogar
098dfcff56 Fix tests 2022-11-14 15:48:23 +00:00
avogar
564d83bbc7 Better handle uint64 2022-11-11 13:24:12 +00:00
avogar
94c6dc42eb Use better types 2022-11-11 13:17:48 +00:00
avogar
cd36caf013 Fix style 2022-11-10 20:37:24 +00:00
avogar
e0b3b9efae Remove old test, clean up a bit 2022-11-10 20:21:29 +00:00
avogar
4d787f3953 Remove unneded method 2022-11-10 20:18:52 +00:00
avogar
9e89af28c6 Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference 2022-11-10 20:15:14 +00:00
Kruglov Pavel
b124875257
Merge branch 'master' into improve-streaming-engines 2022-11-03 13:22:06 +01:00
Nikolay Degterinsky
30ad1a6826
Merge branch 'master' into perf_experiment 2022-11-03 02:18:21 +03:00
avogar
7cc87679e4 Merge branch 'master' of github.com:ClickHouse/ClickHouse into BSONEachRow 2022-11-02 19:47:42 +00:00
Vladimir C
512abfe511 Fix style, remove commented code 2022-11-02 19:42:57 +00:00
vdimir
ef3dbf8192 clang-format BSONUtils 2022-11-02 19:42:49 +00:00
vdimir
223614ee1d Fix typos 2022-11-02 19:42:32 +00:00
vdimir
ab61932223 Apply clang-format for BSONEachRow 2022-11-02 19:42:13 +00:00
Mark Polokhov
2fff4887ac Add BSON input/output format 2022-11-02 19:39:14 +00:00
avogar
9f39a6a049 Fix possible heap-use-after-free 2022-11-02 14:17:48 +00:00
Kruglov Pavel
38124b6533
Merge pull request #42780 from Avogar/parallel-parsing
Support parallel parsing for LineAsString input format
2022-11-02 13:21:53 +01:00
avogar
e39e61fc71 Fix heap-use-after-free in PeekableReadBuffer 2022-11-01 12:58:20 +00:00
Anton Popov
2ae3cfa9e0
Merge branch 'master' into dynamic-columns-14 2022-10-31 16:15:19 +01:00
avogar
fe0aea2e3a Support parallel parsing for LineAsString input format 2022-10-28 21:56:09 +00:00
avogar
d5f68e013d Fix style 2022-10-28 17:09:08 +00:00
avogar
8e13d1f1ec Improve and refactor Kafka/StorageMQ/NATS and data formats 2022-10-28 16:41:10 +00:00
Raúl Marín
e77fcb0a99 More style 2022-10-27 13:22:44 +02:00
Raúl Marín
56a802188b Fix bugs introduced when changing the logic 2022-10-26 18:05:05 +02:00
Raúl Marín
6e0a9452e7 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-10-25 15:25:06 +02:00
Raúl Marín
2fa3c54caa ValuesBlockInputFormat: Adapt to the full tokenizer 2022-10-25 15:22:22 +02:00
Azat Khuzhin
56bc85746f Merge remote-tracking branch 'upstream/master' into build/shorten-64-to-32
Conflicts:
- src/Interpreters/ProcessList.cpp
2022-10-22 16:49:08 +02:00
Azat Khuzhin
5094c0dd6d Fix clang-tidy performance-inefficient-vector-operation
By some reason it appears only after static_cast<> was added [1]:

    /build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp
    Oct 18 01:03:56 /build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp:351:21: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
    Oct 18 01:03:56                     symbols.push_back(root_node->nameAt(i));
    Oct 18 01:03:56                     ^
    Oct 18 01:03:56 /build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp:511:17: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
    Oct 18 01:03:56                 union_skip_fns.push_back(createSkipFn(root_node->leafAt(i)));
    Oct 18 01:03:56                 ^
    Oct 18 01:03:56 /build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp:552:17: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
    Oct 18 01:03:56                 field_skip_fns.push_back(createSkipFn(root_node->leafAt(i)));
    Oct 18 01:03:56                 ^
    Oct 18 01:03:56 197965 warnings generated.

  [1]: https://s3.amazonaws.com/clickhouse-builds/42190/453d91fa3539882dcef1d5ecd5097747499572d8/clickhouse_special_build_check/report.html

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:43 +02:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Alexey Milovidov
ff26251477 Merge branch 'master' into fix-race-condition-finish-cancel 2022-10-21 04:14:21 +02:00
Alexander Tokmakov
68c18abfbb
Merge pull request #42406 from ClickHouse/template_format_better_error
Better error message for unsupported delimiters in custom formats
2022-10-20 15:52:08 +03:00
Alexey Milovidov
dfa202a15d Merge branch 'master' into fix-race-condition-finish-cancel 2022-10-19 02:35:42 +02:00
Kruglov Pavel
29513f6a1f
Merge pull request #41885 from Avogar/with-names-error-message
Better exception message for duplicate column names in schema inference
2022-10-18 15:26:46 +02:00
Alexander Tokmakov
fffecbb9ad better error message for unsupported delimiters in custom formats 2022-10-17 18:08:52 +02:00
Alexey Milovidov
f88ed8195b Fix trash 2022-10-17 04:21:08 +02:00
Kruglov Pavel
7980920bd7
Merge branch 'master' into fix-format-row 2022-10-14 20:49:21 +02:00
Kruglov Pavel
6fc12dd922
Merge pull request #41703 from Avogar/json-object-each-row
Add setting to obtain object name as column value in JSONObjectEachRow format
2022-10-14 20:11:04 +02:00
Alexander Tokmakov
4175f8cde6 abort instead of __builtin_unreachable in debug builds 2022-10-07 21:49:08 +02:00
Anton Popov
6e61cf92f5 Merge remote-tracking branch 'upstream/master' into HEAD 2022-10-03 13:16:57 +00:00
Robert Schulze
db5ef7b3cb
Merge branch 'master' into generated-file-cleanup 2022-10-02 23:13:18 +02:00
Vitaly Baranov
f65d3ff95a Fix parallel parsing: segmentator now checks max_block_size. 2022-09-30 22:34:03 +02:00
Robert Schulze
f24fab7747
Fix some #include atrocities 2022-09-28 13:49:28 +00:00
Robert Schulze
fd86829824
Consolidate config_core.h into config.h
Less duplication, less confusion ...
2022-09-28 13:31:57 +00:00
avogar
c353928eb5 Merge branch 'master' of github.com:ClickHouse/ClickHouse into fix-format-row 2022-09-28 13:15:51 +00:00