avogar
1c0941d72a
Add docs and examples
2023-01-16 16:46:41 +00:00
flynn
29eb30b49f
Fix some reading avro format bugs
...
fix
2023-01-14 18:05:26 +00:00
avogar
e2470dd670
Fix tests
2023-01-13 17:03:53 +00:00
avogar
b461935374
Better
2023-01-12 13:11:04 +00:00
Kruglov Pavel
05a11ff4a4
Merge branch 'master' into tsv-csv-detect-header
2023-01-12 12:35:18 +01:00
avogar
e4d774d906
Better naming
2023-01-11 22:57:14 +00:00
avogar
26cd56d113
Fix tests, make better
2023-01-11 22:52:15 +00:00
avogar
3b45863d15
Make better implementation, fix tests
2023-01-11 17:12:56 +00:00
avogar
6312b75f44
Fix style
2023-01-10 16:28:52 +00:00
avogar
615fe4cecb
Fix tests
2023-01-10 16:27:23 +00:00
Alexey Milovidov
1229a20fb3
Merge pull request #45047 from ClickHouse/fix-buffer-overflow
...
Fix buffer overflow in parser
2023-01-10 05:06:03 +03:00
Yakov Olkhovskiy
4f32f3b8cb
Merge pull request #44484 from bigo-sg/arrow_struct_field
...
Optimization for reading struct fields in parquet/orc files
2023-01-09 15:36:26 -05:00
Kseniia Sumarokova
119501f1d9
Merge pull request #44698 from Avogar/parquet-bool
...
Support Bool type in Arrow/Parquet/ORC
2023-01-09 12:56:28 +01:00
lgbo-ustc
b639bcabc0
some fixes
2023-01-09 18:13:00 +08:00
lgbo-ustc
f127b3a60a
update ArrowFieldIndexUtil
2023-01-09 18:13:00 +08:00
lgbo-ustc
a3bdfddc9d
support nested table
2023-01-09 18:13:00 +08:00
lgbo-ustc
4f3f781b85
fixed test case
2023-01-09 18:13:00 +08:00
lgbo-ustc
755f03db4e
fixed
2023-01-09 18:13:00 +08:00
lgbo-ustc
f6850d96cb
fixed missing columns
2023-01-09 18:13:00 +08:00
lgbo-ustc
4cf6beee27
fixed
2023-01-09 18:13:00 +08:00
lgbo-ustc
81e2832133
fixed
2023-01-09 18:13:00 +08:00
lgbo-ustc
77cea49cec
fixed including header failure
2023-01-09 18:13:00 +08:00
lgbo-ustc
8f8f6f966b
Optimization for reading struct fields in parquet/orc files
2023-01-09 18:13:00 +08:00
Alexey Milovidov
0d39d26a34
Don't fix parallel formatting
2023-01-09 06:15:20 +01:00
Alexey Milovidov
d331f0ce82
Fix buffer overflow in parser
2023-01-09 03:31:12 +01:00
avogar
ee72799121
Fix tests, make better
2023-01-06 20:46:43 +00:00
avogar
7fcdb08ec6
Detect header in CSV/TSV/CustomSeparated files automatically
2023-01-05 22:57:25 +00:00
Yakov Olkhovskiy
7a5a36cbed
Merge branch 'master' into refactoring-ip-types
2023-01-04 11:11:06 -05:00
Kruglov Pavel
b9bdf62bf3
Merge branch 'master' into parquet-bool
2023-01-04 14:49:41 +01:00
Kruglov Pavel
59263f3ae1
Merge pull request #44501 from Avogar/validate-types
...
Validate data types according to settings.
2023-01-04 14:48:09 +01:00
Kruglov Pavel
90ae405033
Merge pull request #44876 from Avogar/fix-perf-tests
...
Revert some changes from #42777 to fix performance tests
2023-01-04 14:27:17 +01:00
Kruglov Pavel
0c7d39ac7f
Merge pull request #44832 from ucasfl/row-number
...
Fix output_format_pretty_row_numbers does not preserve the counter across the blocks
2023-01-04 14:15:47 +01:00
Kruglov Pavel
4e261ab230
Fix JSONCompactEachRow
2023-01-03 21:16:38 +01:00
Kruglov Pavel
314d95fd71
Fix special build
2023-01-03 20:34:30 +01:00
avogar
28eb2dbd4c
Revert some changes from #42777
2023-01-03 18:53:03 +00:00
Alexey Milovidov
e855d3519a
Merge branch 'master' into refactoring-ip-types
2023-01-02 21:58:53 +03:00
Kruglov Pavel
4a7c399076
Merge branch 'master' into parquet-bool
2023-01-02 16:33:42 +01:00
Kruglov Pavel
0a43976977
Merge branch 'master' into validate-types
2023-01-02 16:10:14 +01:00
Kruglov Pavel
1c2dc05d6e
Merge pull request #44446 from Avogar/arrow-nullables
...
Respect setting settings.schema_inference_make_columns_nullable in Parquet/ORC/Arrow formats
2023-01-02 16:05:57 +01:00
Kruglov Pavel
966f57ef68
Merge pull request #42777 from Avogar/improve-streaming-engines
...
Refactor and Improve streaming engines Kafka/RabbitMQ/NATS and data formats
2023-01-02 15:59:06 +01:00
flynn
7780fae9db
fix
2023-01-02 12:38:51 +00:00
flynn
7b487dd923
fix
2023-01-02 12:37:43 +00:00
flynn
3a1dd045dd
Fix output_format_pretty_row_numbers does not preserve the counter across the blocks
2023-01-02 09:27:37 +00:00
Kruglov Pavel
8479615c48
Merge pull request #44684 from Avogar/avro-bool
...
Input/ouptut avro bool type as ClickHouse bool type
2022-12-30 17:56:36 +01:00
Kruglov Pavel
4982d132fb
Merge branch 'master' into validate-types
2022-12-30 17:52:13 +01:00
Nikolay Degterinsky
dfe93b5d82
Merge pull request #42284 from Algunenano/perf_experiment
...
Performance experiment
2022-12-30 03:14:22 +01:00
Kruglov Pavel
894726bd8f
Merge branch 'master' into improve-streaming-engines
2022-12-29 22:59:45 +01:00
avogar
a0db1dd1ea
Support Bool type in Arrow/Parquet/ORC
2022-12-28 22:58:28 +00:00
Raúl Marín
5de11979ce
Unify query elapsed time measurements ( #43455 )
...
* Unify query elapsed time reporting
* add-test: Make shell tests executable
* Add some tests around query elapsed time
* Style and ubsan
2022-12-28 21:01:41 +01:00
Raúl Marín
e915ce1e95
Merge remote-tracking branch 'blessed/master' into perf_experiment
2022-12-28 20:15:43 +01:00
Raúl Marín
f6428964cc
Better and common error handling
2022-12-28 20:15:27 +01:00
avogar
f1191bbbc6
Input/ouptut avro bool type as ClickHouse bool type
2022-12-28 17:38:58 +00:00
avogar
411f98306a
Merge branch 'master' of github.com:ClickHouse/ClickHouse into validate-types
2022-12-27 19:24:15 +00:00
Kruglov Pavel
6dea7336f7
Merge pull request #44405 from Avogar/fix-parquet-orc
...
Fix reading columns that are not presented in input data in Parquet/ORC formats
2022-12-27 16:58:35 +01:00
Raúl Marín
fc1fa82a39
Merge branch 'master' into perf_experiment
2022-12-27 10:51:58 +01:00
Kruglov Pavel
6a017a6586
Merge pull request #43379 from Avogar/better-capn-proto
...
Add small improvements in CapnProto format
2022-12-22 14:50:10 +01:00
Yakov Olkhovskiy
a8cb29da4b
Merge branch 'master' into refactoring-ip-types
2022-12-21 23:56:24 -05:00
avogar
4ab3e90382
Validate types in table function arguments/CAST function arguments/JSONAsObject schema inference
2022-12-21 21:21:30 +00:00
Kruglov Pavel
5e01a3d74e
Merge branch 'master' into improve-streaming-engines
2022-12-21 10:51:50 +01:00
Kruglov Pavel
09ab5832b1
Merge pull request #44382 from Avogar/fix-bson-object-id
...
Fix reading ObjectId in BSON schema inference
2022-12-21 10:48:50 +01:00
avogar
c49638e3a9
Respect setting settings.schema_inference_make_columns_nullable in Parquet/ORC/Arrow formats
2022-12-20 17:46:42 +00:00
Kruglov Pavel
643a35bed1
Merge pull request #44019 from Avogar/refactor-schema-inference
...
Refactor and improve schema inference for text formats
2022-12-20 17:29:03 +01:00
Kruglov Pavel
c0b17ca0af
Merge branch 'master' into fix-bson-object-id
2022-12-20 17:18:10 +01:00
Kruglov Pavel
fe28faa32d
Fix style
2022-12-20 14:49:39 +01:00
Kruglov Pavel
3f1e40aacd
Merge branch 'master' into fix-orc
2022-12-20 13:32:46 +01:00
Raúl Marín
45d27f461b
Merge branch 'master' into perf_experiment
2022-12-20 09:07:48 +00:00
avogar
e262e375dc
Fix reading columns that are not presented in input data in Parquet/ORC formats
2022-12-19 20:30:54 +00:00
avogar
0c406adce2
Fix reading Map type in ORC format
2022-12-19 18:23:07 +00:00
avogar
21cdf6e6ae
Fix reading ObjectId in BSON schema inference
2022-12-19 14:13:42 +00:00
avogar
291e51c533
Merge branch 'better-capn-proto' of github.com:Avogar/ClickHouse into better-capn-proto
2022-12-16 14:43:06 +00:00
avogar
4a51bdce86
Fix comments
2022-12-16 13:58:54 +00:00
Kruglov Pavel
3fad5c7f1f
Merge branch 'master' into refactor-schema-inference
2022-12-16 14:24:51 +01:00
avogar
cfcb444699
Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-capn-proto
2022-12-15 20:04:43 +00:00
avogar
755b08a49e
Fix comments
2022-12-15 19:47:10 +00:00
Kruglov Pavel
c5b2e4cc23
Merge branch 'master' into improve-streaming-engines
2022-12-15 18:44:35 +01:00
avogar
a94a0d9c85
Fix tests, fix bugs
2022-12-14 21:17:00 +00:00
Nikolay Degterinsky
9b6d31b95d
Merge branch 'master' into perf_experiment
2022-12-13 17:15:07 +01:00
avogar
739ad23b1f
Make better, fix bugs, improve error messages
2022-12-12 22:00:45 +00:00
avogar
c224e397ac
Check if delimiters are empty, add comments
2022-12-08 20:00:10 +00:00
avogar
1ec5f8451b
Merge branch 'master' of github.com:ClickHouse/ClickHouse into csv-custom-delimiter
2022-12-08 19:17:42 +00:00
Kruglov Pavel
de5ffc96e9
Fix style
2022-12-08 19:02:36 +01:00
avogar
556746692b
Fix build
2022-12-08 17:20:43 +00:00
Yakov Olkhovskiy
0641066183
Merge branch 'master' into refactoring-ip-types
2022-12-08 11:12:05 -05:00
avogar
7375a7d429
Refactor and improve schema inference for text formats
2022-12-07 21:19:27 +00:00
Kruglov Pavel
c35b2a6495
Add a limit for string size in RowBinary format ( #43842 )
2022-12-02 13:57:11 +01:00
Alexander Tokmakov
431f6551cb
Merge branch 'master' into fix_assertion_in_thread_status
2022-11-30 23:05:15 +03:00
Anton Popov
fe5fff0347
Merge pull request #43329 from xiedeyantu/support_nested_column
...
s3 table function can support select nested column using {column_name}.{subcolumn_name}
2022-11-29 22:27:19 +01:00
Alexander Tokmakov
e45105bf44
detach threads from thread group
2022-11-28 21:31:55 +01:00
Yakov Olkhovskiy
770b520ded
Merge branch 'master' into refactoring-ip-types
2022-11-28 08:50:19 -05:00
Kruglov Pavel
dd7ac8bb96
Update src/Processors/Formats/Impl/CapnProtoRowOutputFormat.cpp
...
Co-authored-by: Nikolay Degterinsky <43110995+evillique@users.noreply.github.com>
2022-11-28 14:17:52 +01:00
Kruglov Pavel
2818ecf7f0
Merge pull request #43297 from arthurpassos/fix_arrow_list_column_parsing
...
Flatten list type arrow chunks on parsing
2022-11-25 18:13:27 +01:00
xiedeyantu
304b6ebf3a
s3 table function can support select nested column using {column_name}.{subcolumn_name}
2022-11-23 23:36:12 +08:00
Raúl Marín
4aa29b6a63
Merge remote-tracking branch 'blessed/master' into perf_experiment
2022-11-22 19:09:00 +01:00
Raúl Marín
e63ba06048
Better cache management
2022-11-22 19:03:17 +01:00
avogar
ecdeff622b
Add small improvements in CapnProto format
2022-11-18 20:13:00 +00:00
Yakov Olkhovskiy
dbaeabcf38
fixed some bugs, some functions corrected, some tests corrected
2022-11-18 20:10:27 +00:00
Arthur Passos
414fd07bba
add docs
2022-11-17 17:28:51 -03:00
Arthur Passos
dd37ca7767
add docs
2022-11-17 17:25:27 -03:00
Arthur Passos
12d3f799a5
small change
2022-11-17 17:18:54 -03:00
Arthur Passos
fcc032a31e
handle both zero based and non zero based arrow offsets
2022-11-17 17:15:24 -03:00
avogar
fcfdd73d17
Improve reading CSV field in CustomSeparated/Template format
2022-11-17 15:36:56 +00:00
Raúl Marín
80403015e7
Fix assert reached with lines without data
2022-11-17 16:19:53 +01:00
Arthur Passos
ed080b8ba5
fix style
2022-11-16 13:22:23 -03:00
Arthur Passos
e1236340b5
Flatten list type arrow chunks on parsing
2022-11-16 12:27:01 -03:00
avogar
2af60f34eb
Restrict document size in parallel parsing, allow to read ObjectId/JS code into String column
2022-11-15 13:35:17 +00:00
avogar
842d25c358
Minor improvements, better docs
2022-11-14 20:05:01 +00:00
avogar
098dfcff56
Fix tests
2022-11-14 15:48:23 +00:00
avogar
564d83bbc7
Better handle uint64
2022-11-11 13:24:12 +00:00
avogar
94c6dc42eb
Use better types
2022-11-11 13:17:48 +00:00
avogar
cd36caf013
Fix style
2022-11-10 20:37:24 +00:00
avogar
e0b3b9efae
Remove old test, clean up a bit
2022-11-10 20:21:29 +00:00
avogar
4d787f3953
Remove unneded method
2022-11-10 20:18:52 +00:00
avogar
9e89af28c6
Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference
2022-11-10 20:15:14 +00:00
Kruglov Pavel
b124875257
Merge branch 'master' into improve-streaming-engines
2022-11-03 13:22:06 +01:00
Nikolay Degterinsky
30ad1a6826
Merge branch 'master' into perf_experiment
2022-11-03 02:18:21 +03:00
avogar
7cc87679e4
Merge branch 'master' of github.com:ClickHouse/ClickHouse into BSONEachRow
2022-11-02 19:47:42 +00:00
Vladimir C
512abfe511
Fix style, remove commented code
2022-11-02 19:42:57 +00:00
vdimir
ef3dbf8192
clang-format BSONUtils
2022-11-02 19:42:49 +00:00
vdimir
223614ee1d
Fix typos
2022-11-02 19:42:32 +00:00
vdimir
ab61932223
Apply clang-format for BSONEachRow
2022-11-02 19:42:13 +00:00
Mark Polokhov
2fff4887ac
Add BSON input/output format
2022-11-02 19:39:14 +00:00
avogar
9f39a6a049
Fix possible heap-use-after-free
2022-11-02 14:17:48 +00:00
Kruglov Pavel
38124b6533
Merge pull request #42780 from Avogar/parallel-parsing
...
Support parallel parsing for LineAsString input format
2022-11-02 13:21:53 +01:00
avogar
e39e61fc71
Fix heap-use-after-free in PeekableReadBuffer
2022-11-01 12:58:20 +00:00
Anton Popov
2ae3cfa9e0
Merge branch 'master' into dynamic-columns-14
2022-10-31 16:15:19 +01:00
avogar
fe0aea2e3a
Support parallel parsing for LineAsString input format
2022-10-28 21:56:09 +00:00
avogar
d5f68e013d
Fix style
2022-10-28 17:09:08 +00:00
avogar
8e13d1f1ec
Improve and refactor Kafka/StorageMQ/NATS and data formats
2022-10-28 16:41:10 +00:00
Raúl Marín
e77fcb0a99
More style
2022-10-27 13:22:44 +02:00
Raúl Marín
56a802188b
Fix bugs introduced when changing the logic
2022-10-26 18:05:05 +02:00
Raúl Marín
6e0a9452e7
Merge remote-tracking branch 'blessed/master' into perf_experiment
2022-10-25 15:25:06 +02:00
Raúl Marín
2fa3c54caa
ValuesBlockInputFormat: Adapt to the full tokenizer
2022-10-25 15:22:22 +02:00
Azat Khuzhin
56bc85746f
Merge remote-tracking branch 'upstream/master' into build/shorten-64-to-32
...
Conflicts:
- src/Interpreters/ProcessList.cpp
2022-10-22 16:49:08 +02:00
Azat Khuzhin
5094c0dd6d
Fix clang-tidy performance-inefficient-vector-operation
...
By some reason it appears only after static_cast<> was added [1]:
/build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp
Oct 18 01:03:56 /build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp:351:21: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
Oct 18 01:03:56 symbols.push_back(root_node->nameAt(i));
Oct 18 01:03:56 ^
Oct 18 01:03:56 /build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp:511:17: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
Oct 18 01:03:56 union_skip_fns.push_back(createSkipFn(root_node->leafAt(i)));
Oct 18 01:03:56 ^
Oct 18 01:03:56 /build/src/Processors/Formats/Impl/AvroRowInputFormat.cpp:552:17: error: 'push_back' is called inside a loop; consider pre-allocating the container capacity before the loop [performance-inefficient-vector-operation,-warnings-as-errors]
Oct 18 01:03:56 field_skip_fns.push_back(createSkipFn(root_node->leafAt(i)));
Oct 18 01:03:56 ^
Oct 18 01:03:56 197965 warnings generated.
[1]: https://s3.amazonaws.com/clickhouse-builds/42190/453d91fa3539882dcef1d5ecd5097747499572d8/clickhouse_special_build_check/report.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:43 +02:00
Azat Khuzhin
4e76629aaf
Fixes for -Wshorten-64-to-32
...
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Alexey Milovidov
ff26251477
Merge branch 'master' into fix-race-condition-finish-cancel
2022-10-21 04:14:21 +02:00
Alexander Tokmakov
68c18abfbb
Merge pull request #42406 from ClickHouse/template_format_better_error
...
Better error message for unsupported delimiters in custom formats
2022-10-20 15:52:08 +03:00
Alexey Milovidov
dfa202a15d
Merge branch 'master' into fix-race-condition-finish-cancel
2022-10-19 02:35:42 +02:00
Kruglov Pavel
29513f6a1f
Merge pull request #41885 from Avogar/with-names-error-message
...
Better exception message for duplicate column names in schema inference
2022-10-18 15:26:46 +02:00
Alexander Tokmakov
fffecbb9ad
better error message for unsupported delimiters in custom formats
2022-10-17 18:08:52 +02:00
Alexey Milovidov
f88ed8195b
Fix trash
2022-10-17 04:21:08 +02:00
Kruglov Pavel
7980920bd7
Merge branch 'master' into fix-format-row
2022-10-14 20:49:21 +02:00
Kruglov Pavel
6fc12dd922
Merge pull request #41703 from Avogar/json-object-each-row
...
Add setting to obtain object name as column value in JSONObjectEachRow format
2022-10-14 20:11:04 +02:00
Alexander Tokmakov
4175f8cde6
abort instead of __builtin_unreachable in debug builds
2022-10-07 21:49:08 +02:00
Anton Popov
6e61cf92f5
Merge remote-tracking branch 'upstream/master' into HEAD
2022-10-03 13:16:57 +00:00
Robert Schulze
db5ef7b3cb
Merge branch 'master' into generated-file-cleanup
2022-10-02 23:13:18 +02:00
Vitaly Baranov
f65d3ff95a
Fix parallel parsing: segmentator now checks max_block_size.
2022-09-30 22:34:03 +02:00
Robert Schulze
f24fab7747
Fix some #include atrocities
2022-09-28 13:49:28 +00:00
Robert Schulze
fd86829824
Consolidate config_core.h into config.h
...
Less duplication, less confusion ...
2022-09-28 13:31:57 +00:00
avogar
c353928eb5
Merge branch 'master' of github.com:ClickHouse/ClickHouse into fix-format-row
2022-09-28 13:15:51 +00:00