Commit Graph

1253 Commits

Author SHA1 Message Date
Alexey Milovidov
989a880230
Merge pull request #62404 from Avogar/trivial-insert-select-from-files
Improve trivial insert select from files, add separate max_parsing_threads setting
2024-04-30 01:57:56 +02:00
Shaun Struwig
a69658b1dd
Merge branch 'ClickHouse:master' into 59557_form_input_format 2024-04-27 21:40:58 +02:00
avogar
ff12caf2e9 Merge branch 'master' of github.com:ClickHouse/ClickHouse into dynamic-data-type 2024-04-26 11:08:04 +00:00
avogar
69a3aa7bcf Implement Dynamic data type 2024-04-26 11:02:33 +00:00
HowePa
5e8bc4402a unified NumpyDataTypes 2024-04-25 15:52:30 +08:00
kevinyhzou
7c9dbdbd9c Improve json read by ignore key case 2024-04-25 12:21:32 +08:00
Kruglov Pavel
52e3c3aa4e
Merge branch 'master' into 56257_parse_crlf_with_TSV_files 2024-04-24 16:20:19 +01:00
Raúl Marín
0d06d69377 Fix parsing of nested proto messages 2024-04-24 13:32:13 +02:00
lgbo-ustc
d4773ef1bb Merge remote-tracking branch 'origin/master' into json_format_early_skip 2024-04-17 08:56:54 +08:00
Robert Schulze
3c35f14804
Merge remote-tracking branch 'ClickHouse/master' into mkmkme/protobuf-25.1 2024-04-15 12:38:59 +00:00
Kruglov Pavel
ce7432424e
Merge branch 'master' into trivial-insert-select-from-files 2024-04-12 14:26:48 +02:00
Alexander Tokmakov
b5ff1c0a6e
Merge branch 'master' into cannot_allocate_thread 2024-04-12 13:35:14 +02:00
lgbo-ustc
e9635189d2 Merge remote-tracking branch 'origin/master' into json_format_early_skip 2024-04-12 08:53:38 +08:00
lgbo-ustc
31a3217355 update settings 2024-04-12 08:52:28 +08:00
lgbo-ustc
a87fb7dc84 Merge remote-tracking branch 'origin/master' into json_format_early_skip 2024-04-11 14:37:12 +08:00
lgbo-ustc
64e47cca9a add settings 2024-04-11 14:36:25 +08:00
Alexander Tokmakov
d8e97b51bf Merge branch 'master' into cannot_allocate_thread 2024-04-10 21:21:42 +02:00
Raúl Marín
d6260e984c Avoid crash when reading protobuf with recursive types 2024-04-10 19:46:52 +02:00
Kruglov Pavel
7a3bfb31e8
Merge pull request #62086 from KevinyhZou/improve_hive_text_read_by_replace_settings
Improve hive text read by allow variable number of fields
2024-04-10 12:49:59 +00:00
HowePa
c0174fa17e [feature] add npy output format 2024-04-09 14:30:14 +08:00
Shaun Struwig
971263c4e6
Merge branch 'ClickHouse:master' into 56257_parse_crlf_with_TSV_files 2024-04-08 21:52:47 +02:00
Shaun Struwig
dde99cbe8c
Merge branch 'ClickHouse:master' into 59557_form_input_format 2024-04-08 21:51:57 +02:00
Kruglov Pavel
c2d432be20
Merge branch 'master' into trivial-insert-select-from-files 2024-04-08 15:57:28 +02:00
avogar
ed6e4fbe16 Improve trivial insert select from files, add separate max_parsing_threads setting 2024-04-08 13:56:15 +00:00
Alexander Tokmakov
5db9fbed52 cancel tasks on exception 2024-04-04 22:32:57 +02:00
Shaun Struwig
05b2cfb563
Merge branch 'ClickHouse:master' into 59557_form_input_format 2024-04-04 14:26:42 +02:00
Raúl Marín
276246ee97 Introduce IAggregateFunction_fwd to reduce header dependencies 2024-04-04 12:29:54 +02:00
Kruglov Pavel
05db73f518
Merge branch 'master' into 56257_parse_crlf_with_TSV_files 2024-04-03 17:17:44 +02:00
Mikhail Koviazin
e7a664e9df
Merge remote-tracking branch 'upstream/master' into mkmkme/protobuf-25.1 2024-04-03 13:51:37 +03:00
Raúl Marín
c35a436435 Remove nested dependency on DateLutImpl 2024-04-02 14:45:48 +02:00
kevinyhzou
6018434f82 add config input_format_hive_text_allow_variable_number_of_columns 2024-04-02 19:37:23 +08:00
Kruglov Pavel
9b5b44dd5f
Merge pull request #61889 from Avogar/allow-to-save-bad-json-escape-sequences
Add a setting to allow saving bad escape sequences in JSON input formats
2024-03-28 14:34:02 +01:00
Yakov Olkhovskiy
257cdd83d4
Merge pull request #60994 from bigo-sg/csv-tuple
fix csv format not support tuple
2024-03-27 09:07:46 -04:00
Shaun Struwig
0e76731c6a
Merge branch 'ClickHouse:master' into 56257_parse_crlf_with_TSV_files 2024-03-27 03:06:51 +01:00
Kruglov Pavel
7220797637
Fix style 2024-03-26 15:26:42 +01:00
avogar
dc87c483dd Add a setting to allow saving bad escape sequences in JSON input formats 2024-03-25 21:58:53 +00:00
Alexey Milovidov
3e5ddddb35 Merge branch 'master' into dont-cut-single-value 2024-03-24 00:51:10 +01:00
Alexey Milovidov
4cbecd0bbd Add a setting 2024-03-23 04:20:52 +01:00
Alexey Milovidov
a2e89c8be7 Fix wrong cases of numbers pretty printing
Add a test

Revert changes from another branch

Add a test

Better test

Revert wrong changes
2024-03-23 03:33:03 +01:00
shuai-xu
9d5cabb26d fix csv format not support tuple 2024-03-22 16:51:58 +08:00
Raúl Marín
de855ca917 Reduce header dependencies 2024-03-19 17:04:29 +01:00
Shaun Struwig
01919f0bd3
Merge branch 'ClickHouse:master' into 56257_parse_crlf_with_TSV_files 2024-03-17 20:32:39 +01:00
Alexey Milovidov
01136bbc3b Limit backtracking in parser 2024-03-17 19:54:45 +01:00
Alexey Milovidov
0a3e42401c Fix fuzzers 2024-03-17 15:44:36 +01:00
avogar
feda83a7c8 Merge branch 'master' of github.com:ClickHouse/ClickHouse into 56257_parse_crlf_with_TSV_files 2024-03-14 17:44:38 +00:00
Shaun Struwig
f251a6d262
Merge branch 'ClickHouse:master' into 59557_form_input_format 2024-03-11 18:52:28 +01:00
Raúl Marín
9bada70f45 Remove a bunch of transitive dependencies 2024-03-11 14:52:32 +01:00
Mikhail Koviazin
490efd2efa
Fixes addressing review comments 2024-03-06 14:35:48 +02:00
Blargian
2ad8ab2a57 Fix linker errors 2024-03-05 19:13:20 +01:00
Shaun Struwig
beb0d08bdb
Merge branch 'ClickHouse:master' into 56257_parse_crlf_with_TSV_files 2024-03-05 14:09:01 +01:00
Alexey Milovidov
570692fe83
Merge branch 'master' into json-ambg-tuple-inference 2024-03-05 04:50:39 +03:00
Kruglov Pavel
4bdafed801
Merge pull request #60420 from HowePa/format_case_insensitive
Make all format names case insensitive.
2024-03-04 19:09:10 +01:00
avogar
9a05461680 Better exception message 2024-03-04 17:49:33 +00:00
avogar
70abdf7a41 Small improvements in JSON schema inference 2024-03-04 17:32:22 +00:00
Alexey Milovidov
cbf5443585 Remove old code 2024-03-04 00:11:55 +01:00
Shaun Struwig
b54286bed4
Merge branch 'master' into 56257_parse_crlf_with_TSV_files 2024-03-01 18:21:05 +01:00
Mikhail Koviazin
11371e886c
Update protobuf to v25.1
The new version deprecates `syntax()` and makes it inaccessible. Instead, the
attributes corresponding to a feature should be used. This commit addresses
this.
2024-03-01 17:03:10 +02:00
Robert Schulze
4ee1aa8c7c
Fixing more headers 2024-02-29 15:40:30 +00:00
Alexey Milovidov
e6dffb1f2d
Merge pull request #60379 from rogeryk/improve-pretty-format
Improve pretty format if a block consists of a single numeric value and exceeds one million.
2024-02-29 02:20:42 +03:00
Alexey Milovidov
fe42d8ecfc
Merge pull request #60050 from ClickHouse/less-memory-usage-primary-key-2
Less memory usage in primary key, variant 2
2024-02-28 19:22:46 +03:00
Alexey Milovidov
c192a448d0 Update to clang-19 2024-02-27 14:37:21 +01:00
豪肥肥
6f9cb058a6
Update FormatFactory.cpp 2024-02-27 07:59:09 +08:00
豪肥肥
24155c80c9
Update src/Formats/FormatFactory.cpp
Co-authored-by: Kruglov Pavel <48961922+Avogar@users.noreply.github.com>
2024-02-27 07:50:04 +08:00
HowePa
dbd8d35f01 use lower case in dict 2024-02-27 00:48:34 +08:00
HowePa
0b72f7b182 Make all format names case insensitive. 2024-02-26 22:46:51 +08:00
rogeryk
7a92f542b4 Add setting output_format_pretty_single_large_number_tip_threshold 2024-02-26 20:19:53 +08:00
Alexey Milovidov
209a151e00 Merge branch 'master' into less-memory-usage-primary-key-2 2024-02-23 16:02:29 +01:00
Shaun Struwig
4a5761ce1f
Merge branch 'ClickHouse:master' into 56257_parse_crlf_with_TSV_files 2024-02-20 22:18:01 +01:00
Kruglov Pavel
5fd2582e83
Merge pull request #59500 from Avogar/exponent-floats-inference
Don't infer floats in exponential notation by default
2024-02-19 13:51:07 +01:00
Alexey Milovidov
df48106cd5
Merge pull request #60015 from azat/values-quote-escape
Fix INSERT into SQLite with single quote (by escaping single quotes with a quote instead of backslash)
2024-02-19 10:37:45 +01:00
Alexey Milovidov
04b2f085bf Revert changes where they were not correct 2024-02-17 05:14:43 +01:00
Shaun Struwig
1d440f0399
Merge branch 'ClickHouse:master' into 56257_parse_crlf_with_TSV_files 2024-02-16 19:48:33 +01:00
Kruglov Pavel
4d6f167e0c
Merge pull request #59092 from Avogar/auto-format-detection
Try to detect file format automatically during schema inference if it's unknown
2024-02-16 14:32:18 +01:00
Azat Khuzhin
bbe38a3fe4 Add ability to escape quotes in Values format with single quote
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2024-02-15 12:47:29 +01:00
Alexey Milovidov
bbd7acd7f9
Merge branch 'master' into exponent-floats-inference 2024-02-15 01:46:51 +01:00
Shaun Struwig
525a5188e4
Merge branch 'master' into 56257_parse_crlf_with_TSV_files 2024-02-14 17:40:08 +01:00
Kruglov Pavel
db2c15c0a6
Apply suggestions from code review
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2024-02-14 12:24:10 +01:00
Blargian
91d681693c add failing test and Form format skeleton code 2024-02-14 10:19:06 +01:00
Blargian
0ff4202452 fix up test and modify skipRowEndDelimiter so that crlf works with .tsv 2024-02-08 07:08:17 +01:00
Blargian
ab384f8652 add support_crlf for TSV format 2024-02-04 15:29:57 +01:00
avogar
ff21aa9a19 Don't infer floats in exponential notation by default 2024-02-01 19:47:05 +00:00
Shaun Struwig
e081a4f059
Merge branch 'master' into #31363_format_template_configure_in_settings 2024-01-29 21:29:22 +01:00
Blargian
4a8a7208f2 rename of settings, add setting for resultset, extend test, fix documentation and add to SettingsChanges log 2024-01-29 21:25:58 +01:00
avogar
377937d415 Merge branch 'master' of github.com:ClickHouse/ClickHouse into auto-format-detection 2024-01-29 15:45:18 +00:00
Kruglov Pavel
6858d2f4ca
Merge pull request #58047 from Avogar/variant-data-type
Implement Variant data type
2024-01-29 11:36:08 +01:00
Alexey Milovidov
aec3f28ccb Support backups for compressed in-memory tables 2024-01-28 23:06:50 +01:00
Blargian
8421f23975 #56257 - add failing test and new setting for parsing TSV files with crlf 2024-01-28 22:56:47 +01:00
avogar
5833641fa5 Merge branch 'master' of github.com:ClickHouse/ClickHouse into variant-data-type 2024-01-26 16:54:02 +00:00
Kruglov Pavel
46a6b84a5a
Merge branch 'master' into auto-format-detection 2024-01-25 22:11:07 +01:00
Shaun Struwig
e6844a5412
Merge branch 'ClickHouse:master' into #31363_format_template_configure_in_settings 2024-01-25 20:06:45 +01:00
Maksim Kita
2a327107b6 Updated implementation 2024-01-25 14:31:49 +03:00
avogar
11f1ea50d7 Fix tests 2024-01-24 17:55:31 +00:00
avogar
93fbe1d9c8 Fixes 2024-01-23 18:59:40 +00:00
avogar
5e4796ae16 Fix heap-use-after-free 2024-01-23 18:59:40 +00:00
avogar
eaca40c53e Update tests 2024-01-23 18:59:40 +00:00
avogar
f05174e441 Fix style 2024-01-23 18:59:40 +00:00
avogar
617cc514b7 Try to detect file format automatically during schema inference if it's unknown 2024-01-23 18:59:39 +00:00
Blargian
7b235fe643 #31363 - remove schema delimiter setting and add test 00937_format_schema_rows_template.sh and reference 2024-01-22 22:59:59 +02:00
Blargian
eae39ff545 #31363 - modified TemplateBlockOutputFormat to work with added format_schema_rows_template setting 2024-01-21 21:51:06 +02:00
Blargian
f1749217ee added format_schema_rows_template setting 2024-01-18 21:53:56 +02:00
Kruglov Pavel
5444cde408
Merge branch 'master' into variant-data-type 2024-01-18 18:31:27 +01:00
Kruglov Pavel
6d064512e1
Merge pull request #58614 from Blargian/58363_disable_ansi_pretty_automatically
58363 Automatically disable ANSI escape sequences in Pretty formats if the output is not a terminal
2024-01-17 13:45:41 +01:00
Blargian
5f500522a4 #58363 - added setting is_writing_to_terminal to FormatSettings.h, modified PrettyBlockOutputFormat to use this, which is set in FormatFactory.cpp getOutputFormat and getOutputFormatParallelIfPossible 2024-01-15 16:32:51 +02:00
Alexey Milovidov
afb50f03d9
Merge pull request #58519 from Avogar/control-arrow-dict-indexes-type
Add settings for better control of indexes type in Arrow dictionary
2024-01-13 20:00:40 +01:00
Blargian
ce012d217f #58363 - fix style check failing attempt no.2 2024-01-13 13:21:36 +02:00
Blargian
0fdba3b83d #58363 - fix failing style check 2024-01-13 12:14:54 +02:00
Alexey Milovidov
d112492c56 Remove some code 2024-01-13 03:48:04 +01:00
Blargian
72b5cf5993 #58363 - removed switch from PrettyBlockOutputFormat and modified BlockOutputFormats to use color variable. Updated english and russian documentation. Updated test 00405 reference file. 2024-01-12 19:46:03 +02:00
avogar
fbfdde60a7 Add settings for better control of indexes type in Arrow dictionary. Use signed integer type for indexes by default 2024-01-12 13:06:51 +00:00
Blargian
aa8876a611 #58363 - Changes based on review of draft PR - changed output_format_pretty_color to use UInt64Auto. Added isWritingToTerminal function to IO/WriteHelpers.h and updated test 2024-01-12 12:31:57 +02:00
Blargian
b65adbecc1 minor fixes. Doesnt seem to be using ANSI escapes anymore 2024-01-08 23:52:25 +02:00
Kruglov Pavel
b947609b8e
Merge branch 'master' into variant-data-type 2024-01-08 15:04:51 +01:00
Blargian
a15b573315 #58363 - fix formatting issues and change ON, OFF, AUTO to 0, 1, auto 2024-01-08 15:25:14 +02:00
avogar
7e5ba62017 Allow to read Bool values into String in JSON input formats 2024-01-05 20:33:30 +00:00
Blargian
459946035c #58363 🚧 modified Pretty in FormatSettings.h to have PrettyColor which can be 0,1 or auto. modified output_format_pretty_color in FormatFactory.cpp to make use of this, added the default to Settings.h. Implemented the logic for enabling/disabling based on output_format_pretty_color in PrettyBlockOutputFormat.h 2024-01-04 16:10:36 +02:00
Kruglov Pavel
4d8cf71ba7
Merge branch 'master' into variant-data-type 2024-01-03 15:21:23 +01:00
Kruglov Pavel
c03e36e012
Merge branch 'master' into better-parsing-exceptions 2023-12-29 18:07:32 +01:00
avogar
fee2eadaf0 Fix parallel parsing for JSONCompactEachRow 2023-12-27 16:16:41 +00:00
Alexey Milovidov
ef66714bf2
Merge pull request #58196 from ClickHouse/strange-code
Looking at strange code
2023-12-24 03:36:41 +01:00
Alexey Milovidov
6bb181ce55 Looking at strange code 2023-12-23 13:06:34 +01:00
Alexey Milovidov
ff6419361a
Merge pull request #58181 from ClickHouse/remove-parallel-parsing-json-compact-each-row
Remove parallel parsing for JSONCompactEachRow
2023-12-23 10:16:40 +01:00
Alexey Milovidov
468b5e2813 Fix use-after-move 2023-12-23 08:23:15 +01:00
Alexey Milovidov
6a23fe034f Remove parallel parsing for JSONCompactEachRow 2023-12-23 06:17:47 +01:00
avogar
319ae440b6 Implement Variant data type 2023-12-19 16:45:15 +00:00
Nikita Mikhaylov
a0af0392cd
Random changes in random files (#57642) 2023-12-14 12:47:11 +01:00
Kruglov Pavel
ee39dca8c7
Merge branch 'master' into better-parsing-exceptions 2023-12-12 17:10:38 +01:00
Kruglov Pavel
6567fb2c08
Merge pull request #56859 from Avogar/csv-infer-numbers-from-strings
Allow to infer numbers from strings in CSV format
2023-12-12 17:09:02 +01:00
Kruglov Pavel
fd08c6f06a
Merge pull request #57751 from Avogar/better-json-inference-unnamed-tuples
Slightly better inference of unnamed tupes in JSON formats
2023-12-12 17:08:34 +01:00
Kruglov Pavel
8a447bf57c
Merge pull request #55892 from Avogar/schema-inference-union
Add 'union' mode for schema inference
2023-12-12 15:02:06 +01:00
Kruglov Pavel
9dba7aa13d
Merge pull request #57364 from Avogar/better-json-fallback
Better JSON -> JSONEachRow fallback without catching exceptions
2023-12-11 19:03:50 +01:00
Kruglov Pavel
20510cde34
Merge pull request #57006 from Avogar/save-errors-better
Fix early stop while parsing file with skipping lots of errors
2023-12-11 19:03:14 +01:00
avogar
c3a76fcc08 Allow to infer numbers from strings in CSV format 2023-12-11 18:02:05 +00:00
avogar
a87a8e91cf Slightly better inference of unnamed tupes in JSON formats 2023-12-11 14:46:12 +00:00
avogar
ee7af95bc0 Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union 2023-12-08 20:29:28 +00:00
Kruglov Pavel
c6fecfb1af
Merge pull request #56901 from KevinyhZou/Fix_allow_cr_end_of_csv_line
Fix allow cr end of line for csv
2023-11-29 20:57:58 +01:00
avogar
b493ce2385 Better JSON -> JSONEachRow fallback without catching exceptions 2023-11-29 14:19:38 +00:00
János Benjamin Antal
ab935e3dd7 Use the google proto files when importing protobuf schemas 2023-11-22 12:39:41 +00:00
avogar
7e392eec50 Better exception messages in input formats 2023-11-21 13:13:42 +00:00
kevinyhzou
3adc8fdf78 Fix ci 2023-11-21 11:22:12 +08:00
avogar
ffa90628f0 Make input format errors logger a bit better 2023-11-20 17:22:49 +00:00
avogar
081fa9f3de Address comments 2023-11-20 15:53:28 +00:00
avogar
872556a5d4 Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union 2023-11-20 14:03:36 +00:00
avogar
6366819f12 Fix generating deep nested columns in CapnProto/Protobuf schemas 2023-11-17 16:52:20 +00:00
yariks5s
181231d500 init 2023-11-07 17:56:02 +00:00
kevinyhzou
2a50daf5dd Allow cr at end of csv line 2023-11-06 12:21:42 +08:00
kevinyhzou
ef30e6723d bug fix csv read while end of line is not crlf 2023-11-06 12:21:42 +08:00
Kruglov Pavel
754ab9fa6c
Merge pull request #55974 from Avogar/fix-protobuf-auto-schema
Fix autogenerated Protobuf schema with fields with underscore
2023-11-01 18:17:09 +01:00
Kruglov Pavel
bf77ce691c
Merge pull request #55982 from yariks5s/npy_input_format
New input format Npy
2023-11-01 14:26:22 +01:00
yariks5s
6c4bf59021 fix suggestions and enhance tests 2023-10-31 18:10:55 +00:00
yariks5s
9a2d89e3e4 removed getSize() and enhanced docs 2023-10-30 12:42:19 +00:00