Commit Graph

1253 Commits

Author SHA1 Message Date
yariks5s
e14a7f066a fix typos 2023-10-28 01:46:59 +00:00
yariks5s
894724bfb3 suggested changes 2023-10-28 01:17:25 +00:00
yariks5s
23635352f1 fixed due to review 2023-10-27 15:43:03 +00:00
Kruglov Pavel
bb4b95e891
Merge branch 'master' into schema-inference-union 2023-10-27 14:53:58 +02:00
Kruglov Pavel
570b66f027
Merge branch 'master' into schema-inference-union 2023-10-26 19:26:00 +02:00
zvonand
0766c73aab Rename date_time_overflow_mode -> date_time_overflow_behavior, moved it to format settings 2023-10-25 23:11:13 +02:00
李扬
465962df7f
Support orc filter push down (file + stripe + rowgroup level) (#55330)
* support orc filter push down

* update orc lib version

* replace setqueryinfo with setkeycondition

* fix issue https://github.com/ClickHouse/ClickHouse/issues/53536

* refactor source with key condition

* fix building error

* remove std::cout

* update orc

* update orc version

* fix bugs

* improve code

* upgrade orc lib

* fix code style

* change as requested

* add performance tests for orc filter push down

* add performance tests for orc filter push down

* fix all bugs

* fix default as null issue

* add uts for null as default issues

* upgrade orc lib

* fix failed orc lib uts and fix typo

* fix failed uts

* fix failed uts

* fix ast fuzzer tests

* fix bug of uint64 overflow in https://s3.amazonaws.com/clickhouse-test-reports/55330/de22fdcaea2e12c96f300e95f59beba84401712d/fuzzer_astfuzzerubsan/report.html

* fix asan fatal caused by reused column vector batch in native orc input format. refer to https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__asan__[4_4].htm

* fix wrong performance tests

* disable 02892_orc_filter_pushdown on aarch64. https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__aarch64_.html

* add some comments

* add some comments

* inline range::equals and range::less

* fix data race of key condition

* trigger ci
2023-10-24 12:08:17 -07:00
avogar
b2c72f95b2 Fix autogenerated Protobuf schema with fields with underscore 2023-10-24 13:08:06 +00:00
taiyang-li
9c186d18a8 retrigger ci 2023-10-24 16:13:53 +08:00
taiyang-li
a02c49e16f allow skip null values when serailize tuple to json objects 2023-10-24 11:47:46 +08:00
avogar
544b217d91 Fix style 2023-10-20 21:05:26 +00:00
Kruglov Pavel
6f61ccfe28
Merge branch 'master' into schema-inference-union 2023-10-20 22:54:11 +02:00
avogar
6934e27e8b Add union mode for schema inference to infer union schema of files with different schemas 2023-10-20 20:46:41 +00:00
Raúl Marín
e500dc22e4 Respect default format when using http_write_exception_in_output_format 2023-10-17 14:14:58 +02:00
Michael Kolupaev
ce7eca0615
DWARF input format (#55450)
* Add ReadBufferFromFileBase::isRegularLocalFile()

* DWARF input format

* Review comments

* Changed things around ENABLE_EMBEDDED_COMPILER build setting

* Added 'ranges' column

* no-msan no-ubsan
2023-10-16 17:00:07 -07:00
yariks5s
9ae025d7e6 mid commit 2023-10-12 17:37:59 +00:00
Azat Khuzhin
2cbb069b68 Add ability to ignore data after semicolon in Values format
This is required for client, to handle comments in multiquery mode.

v0: separate context for input format
v2: cannot use separate context since params and stuff are changed in global context
v3: do not sent this setting to the server (breaks queries for readonly profiles)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-10-12 14:55:26 +02:00
Robert Schulze
9d04d3c3ad
Merge remote-tracking branch 'rschu1ze/master' into better-use-mysql-types-in-show-columns 2023-10-11 15:04:03 +00:00
Robert Schulze
bd43b84bf8
Make use_mysql_types_in_show_columns affect only SHOW COLUMNS 2023-10-10 23:09:49 +00:00
Kruglov Pavel
5ded0005a3
Merge pull request #55064 from AVMusorin/system-drop-format-cache
Allow drop cache for Protobuf format
2023-10-10 14:23:11 +02:00
Aleksandr Musorin
8d0c961af0 Allow drop cache for protobuf format
Before it was impossible to update Protobuf schema without server
restart. With this commit, it is enough to send query `SYSTEM DROP
SCHEMA FORMAT CACHE [FOR Protobuf]`.
2023-10-09 10:41:15 +02:00
avogar
c68e008f87 Apply suggestions 2023-09-27 11:18:39 +00:00
avogar
6b3dbc4403 Apply suggestions 2023-09-26 16:41:35 +00:00
Kruglov Pavel
bea80ab5b7
Merge branch 'master' into json-object-as-tuple-inference 2023-09-26 15:23:08 +02:00
Kruglov Pavel
69a17bbef6
Merge pull request #52853 from Avogar/http-valid-json-on-exception
Output valid JSON/XML on excetpion during HTTP query execution
2023-09-26 14:25:55 +02:00
Kruglov Pavel
b6863a9f52
Fix comments 2023-09-26 14:13:34 +02:00
avogar
cabb3ddaae Fix tests 2023-09-25 21:45:11 +00:00
avogar
95d50fd7de Fix tests 2023-09-25 18:47:33 +00:00
avogar
4d4e3db84a Fix style and build 2023-09-25 17:49:56 +00:00
avogar
9e75825515 Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-object-as-tuple-inference 2023-09-25 17:24:36 +00:00
avogar
33a896ee6c Small fixes 2023-09-25 15:52:17 +00:00
avogar
42ca897f2d Better schema inference for JSON formats 2023-09-25 15:42:59 +00:00
robot-ch-test-poll2
37f732f622
Merge pull request #54808 from ClickHouse/eof
Prevent parquet schema inference reading the first 1 MB of the file unnecessarily
2023-09-20 09:49:42 +02:00
Michael Kolupaev
7271cfd187 Prevent parquet schema inference reading the first 1 MB of the file unnecessarily 2023-09-19 21:58:12 +00:00
avogar
8c29408f5e Parse data in JSON format as JSONEachRow if failed to parse metadata 2023-09-19 11:53:40 +00:00
avogar
2bd747dbe4 Fix tests 2023-09-15 15:26:26 +00:00
Kruglov Pavel
2c407ab3c0
Merge branch 'master' into json-object-as-tuple-inference 2023-09-15 16:29:48 +02:00
Kruglov Pavel
dbd24b240c
Merge branch 'master' into http-valid-json-on-exception 2023-09-15 14:55:31 +02:00
Kruglov Pavel
6419f91cfc
Merge pull request #54585 from ClickHouse/Avogar-patch-1
Remove output_format_markdown_escape_special_characters from settings changes history
2023-09-14 15:45:05 +02:00
slvrtrn
c0961d9378 Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql 2023-09-13 19:33:11 +02:00
avogar
1480c8ad30 Place setting into separate struct 2023-09-13 13:19:05 +00:00
slvrtrn
dddea9219a Address the review comments 2023-09-12 18:39:03 +02:00
robot-clickhouse
1c8ee76ba2
Merge pull request #54513 from Avogar/formats-with-names-no-header
Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header
2023-09-12 17:58:03 +02:00
slvrtrn
611a75a87f Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql 2023-09-12 10:38:44 +02:00
avogar
803d8dcf85 Support NULL as default for nested types Array/Tuple/Map for input formats 2023-09-11 18:18:33 +00:00
avogar
18a8e58802 Enable setting by default, change logic for object/string/map inference from objects 2023-09-11 18:15:56 +00:00
avogar
b5cccc5f8d Remove unused field 2023-09-11 14:58:02 +00:00
avogar
2d8f33bfa2 Fix parsing error in WithNames formats while reading subset of columns with disabled input_format_with_names_use_header 2023-09-11 14:55:37 +00:00
avogar
ba307c7466 Allow to infer named Tuples from JSON objects under a setting in JSON formats 2023-09-07 19:41:19 +00:00
Kruglov Pavel
de801ce563
Merge pull request #54293 from ClickHouse/pasha-returned-from-vacation
Code improvement for reading from archives
2023-09-05 21:09:09 +02:00
Robert Schulze
f4c86fe34e
Merge pull request #53860 from irenjj/feat_markdown
Add a setting to escape special characters in Markdown.
2023-09-05 15:44:11 +02:00
Antonio Andelic
88930a335c Apply comments 2023-09-05 12:32:07 +00:00
robot-clickhouse
926c5636dd
Merge pull request #42599 from ClickHouse/build-fuzzer-protocol
libFuzzer: add CI fuzzers build, add tcp protocol fuzzer, fix other fuzzers.
2023-09-04 22:41:54 +02:00
slvrtrn
bb0eff9669 Revert format changes 2023-09-04 21:15:26 +02:00
Yakov Olkhovskiy
361b21b416 fix fuzzers, cmake refactor, add target fuzzers 2023-09-01 14:20:50 +00:00
Robert Schulze
aefb543734
Merge remote-tracking branch 'rschu1ze/master' into feat_markdown 2023-08-31 11:32:44 +00:00
slvrtrn
8378441248 Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql 2023-08-30 19:00:51 +02:00
Alexey Milovidov
bbef3ceeb0
Merge pull request #53902 from HarryLeeIBM/hlee-s390x-stripe-log
Fix StripeLog storage endian issue for s390x
2023-08-29 00:35:48 +03:00
HarryLeeIBM
dcecf52a68 Fix StripeLog storage endian issue for s390x 2023-08-28 11:35:04 -07:00
irenjj
51aa89eed8 Add a setting to automatically escape special characters in Markdown. 2023-08-28 00:10:33 +08:00
slvrtrn
055d2e3c3d Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql 2023-08-25 21:27:47 +02:00
slvrtrn
734ffd916c WIP prepared statements 2023-08-25 20:31:21 +02:00
Yakov Olkhovskiy
415a993c91 fix fuzzers build 2023-08-24 23:22:39 +00:00
Kruglov Pavel
f7e1abd774
Merge branch 'master' into cache-count 2023-08-23 22:31:49 +02:00
Kruglov Pavel
67c5c0203b
Merge branch 'master' into fast-count-from-files 2023-08-22 15:03:48 +02:00
Kruglov Pavel
c0bdd0e00b
Merge branch 'master' into cache-count 2023-08-22 14:42:22 +02:00
avogar
b4145aeddc Cache number of rows in files for count in file/s3/url/hdfs/azure functions 2023-08-22 11:59:59 +00:00
Michael Kolupaev
2f4d433e69 Parquet filter pushdown 2023-08-21 14:15:52 -07:00
Michael Kolupaev
6009e1b293
Merge pull request #53324 from bigo-sg/ch_gluten_2583
Implement native orc input format without arrow to improve performance
2023-08-21 13:44:57 -07:00
Kruglov Pavel
88aee95122
Merge branch 'master' into fast-count-from-files 2023-08-21 14:46:33 +02:00
avogar
47304bf7aa Optimize count from files in most input formats 2023-08-21 12:30:52 +00:00
Kruglov Pavel
c68456a20a
Merge pull request #52692 from Avogar/variable-number-of-volumns-more-formats
Allow variable number of columns in more formats, make it work with schema inference
2023-08-21 13:28:35 +02:00
taiyang-li
f723e8d43a change as request 2023-08-21 12:09:02 +08:00
Michael Kolupaev
a1522e22ea
Merge pull request #53281 from Avogar/batch-small-parquet-row-groups
Optimize reading small row groups by batching them together in Parquet
2023-08-18 17:15:42 -07:00
avogar
bca91548ad Add setting input_format_parquet_local_file_min_bytes_for_seek 2023-08-17 12:28:01 +00:00
Alexander Tokmakov
ba44d7260e fix 2023-08-16 00:20:28 +02:00
avogar
7e863a2726 Address comments 2023-08-11 13:17:49 +00:00
avogar
3ad7e57059 Optimize reading small row groups by batching them together in Parquet 2023-08-11 13:17:45 +00:00
Kruglov Pavel
b6b0e9c6bc
Merge branch 'master' into variable-number-of-volumns-more-formats 2023-08-11 14:02:21 +02:00
Kruglov Pavel
00865a7dad
Merge branch 'master' into format-one 2023-08-11 13:54:58 +02:00
Kruglov Pavel
6600f87f86
Merge branch 'master' into http-valid-json-on-exception 2023-08-10 13:53:32 +02:00
Kruglov Pavel
33a39900ad
Merge branch 'master' into variable-number-of-volumns-more-formats 2023-08-09 19:51:17 +02:00
avogar
01a7c7560f Add input format One 2023-08-09 11:25:32 +00:00
Alexey Milovidov
29221188ba Fix error 2023-08-09 04:07:31 +02:00
Alexey Milovidov
5dd99db369 Add diagnostic info about file name during schema inference 2023-08-08 03:55:06 +02:00
Anton Popov
ff137773e7
Merge branch 'master' into formats-with-subcolumns 2023-08-02 15:24:56 +02:00
Alexey Milovidov
3af4fd4003
Merge branch 'master' into change-protocol-version 2023-08-02 15:44:08 +03:00
Anton Popov
aec0667f16 better check of version for sparse serialization 2023-08-01 23:39:52 +00:00
avogar
fa905ebd27 Clean up 2023-08-01 10:14:09 +00:00
avogar
a71cd56a90 Output valid JSON/XML on excetpion during HTTP query execution 2023-08-01 10:06:56 +00:00
Anton Popov
525da38316 increase min protocol version for sparse serialization 2023-07-31 18:58:58 +00:00
Kruglov Pavel
3e1c409e60
Merge branch 'master' into structure-to-schema 2023-07-28 11:32:16 +02:00
avogar
6d77d52dfe Allow variable number of columns in TSV/CuatomSeprarated/JSONCompactEachRow, make schema inference work with variable number of columns 2023-07-27 18:02:29 +00:00
Kruglov Pavel
0cd2d7449b
Use for-range loop, add comment 2023-07-27 00:01:25 +02:00
Alexey Milovidov
bc86c26e4e
Update src/Formats/StructureToFormatSchemaUtils.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2023-07-26 23:37:20 +03:00
Kruglov Pavel
0d34e97dbe
Merge branch 'master' into formats-with-subcolumns 2023-07-26 13:30:35 +02:00
Michael Kolupaev
8184a289e5 Partially reimplement Parquet encoder to make it faster and parallelizable 2023-07-25 10:16:28 +00:00
Igor Nikonov
2d33661594
Merge branch 'master' into fix-potentially-bad-code 2023-07-22 22:48:07 +02:00
avogar
da6a31bb62 Fix tests and style 2023-07-19 13:26:09 +00:00
Kruglov Pavel
f0026af189
Revert "Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed"" 2023-07-19 14:51:11 +02:00
Kruglov Pavel
7b3564f96a
Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed" 2023-07-19 14:44:59 +02:00
robot-ch-test-poll4
63d0616a22
Merge pull request #51716 from KevinyhZou/bug_fix_csv_field_type_not_match
Improve CSVInputFormat to check and set default value to column if deserialize failed
2023-07-19 14:41:05 +02:00
kevinyhzou
95424177d5 review fix 2023-07-19 18:26:54 +08:00
Alexey Milovidov
8cd2e7c7d6 Merge branch 'master' into fix-potentially-bad-code 2023-07-18 22:18:22 +02:00
avogar
b300781fd8 Make better, add tests 2023-07-18 17:48:39 +00:00
avogar
67f340b501 Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema 2023-07-18 13:52:15 +00:00
Kruglov Pavel
1dd05319b5
Merge branch 'master' into formats-with-subcolumns 2023-07-17 19:13:42 +02:00
kevinyhzou
355faa4251 ci fix 2023-07-17 20:08:32 +08:00
robot-clickhouse-ci-2
ac3cc1c2ff
Merge pull request #45671 from ClibMouse/feature/interval-kql-style-formatting
Implement KQL-style formatting for Interval
2023-07-16 04:06:54 +02:00
kevinyhzou
b2665031dc review fix 2023-07-13 20:27:14 +08:00
kevinyhzou
ba57c84db3 bug fix csv input field type mismatch 2023-07-13 20:24:10 +08:00
Alexander Gololobov
9757e272b9 Check number of rows in the reader instead 2023-07-11 12:24:16 +02:00
ltrk2
2d2debe3ce Introduce a separate setting for interval output formatting 2023-07-10 13:51:49 -04:00
ltrk2
b673aa8e6b Use the dialect configuration 2023-07-10 13:51:49 -04:00
ltrk2
522b9ebf8c Implement KQL-style formatting for Interval 2023-07-10 13:51:49 -04:00
Dmitry Kardymon
32f5a78302 Fix setting name 2023-07-06 07:32:46 +00:00
Dmitry Kardymon
24b5c9c204 Use one setting input_format_csv_allow_variable_number_of_colums and code in RowInput 2023-07-06 06:05:43 +00:00
avogar
98aa6b317f Support reading subcolumns from file/s3/hdfs/url/azureBlobStorage table functions 2023-07-04 21:17:26 +00:00
Dmitry Kardymon
ab4142eb8f Merge remote-tracking branch 'clickhouse/master' into ADQM-870 2023-07-04 08:23:31 +03:00
Alexey Milovidov
27f41869a9 Remove code that I don't like 2023-06-25 09:11:42 +02:00
avogar
03f820bc4a Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema 2023-06-22 18:46:01 +00:00
avogar
4060beae49 Structure to CapnProto/Protobuf schema take 1 2023-06-22 18:00:00 +00:00
Michael Kolupaev
2498170253 Fix use-after-free in StorageURL when switching URLs 2023-06-22 16:24:12 +00:00
Dmitry Kardymon
30bea857fd Merge remote-tracking branch 'origin/master' into ADQM-870 2023-06-19 07:19:07 +00:00
Kruglov Pavel
11f176dd19
Merge pull request #50712 from KevinyhZou/bug_fix_csv_parse_by_tab_delimiter
Support CSVInputFormat to read csv file by whitespace & tab delimiter
2023-06-16 13:16:22 +02:00
Dmitry Kardymon
806176d88e Add input_format_csv_missing_as_default setting and tests 2023-06-15 11:23:08 +00:00
KevinyhZou
953f40aa3b
Merge branch 'master' into bug_fix_csv_parse_by_tab_delimiter 2023-06-15 10:25:19 +08:00
Dmitry Kardymon
a91fc3ddb3 Add docs/ add more cases in test 2023-06-14 16:44:31 +00:00
Dmitry Kardymon
ed318d1035 Add input_format_csv_ignore_extra_columns setting (prototype) 2023-06-14 10:35:36 +00:00
Chang Chen
86694847c6 using Reader instead of typename CapnpType::Reader 2023-06-14 15:22:32 +08:00
Chang Chen
e281026e00 fix build issue on clang 15 2023-06-14 12:29:55 +08:00
kevinyhzou
f3b99156ac review fix 2023-06-14 10:48:21 +08:00
Kruglov Pavel
607f337d67
Merge pull request #50592 from Avogar/max-bytes-to-read-in-schema-inference
Add setting to limit the number of bytes to read in schema inference
2023-06-13 16:47:57 +02:00
Kruglov Pavel
8fdcd91c38
Merge pull request #49752 from Avogar/better-capnproto-3
Refactor CapnProto format to improve input/output performance
2023-06-13 16:20:38 +02:00
Kruglov Pavel
24d70a2afd
Fix 2023-06-12 13:37:59 +02:00
kevinyhzou
911f8ad8dc use whitespace or tab as field delimiter 2023-06-12 11:57:52 +08:00
avogar
47b0c2a862 Make better 2023-06-09 13:01:36 +00:00
kevinyhzou
48e1b21aab Add feature to support read csv by space & tab delimiter 2023-06-08 20:34:30 +08:00
avogar
cc036528fe Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-capnproto-3 2023-06-08 11:16:13 +00:00
Kruglov Pavel
1baa6404e6
Merge branch 'master' into skip-trailing-empty-lines 2023-06-06 19:39:34 +02:00
avogar
df50833b70 Allow to skip trailing empty lines in CSV/TSV/CustomeSeparated formats 2023-06-06 17:33:05 +00:00
Kruglov Pavel
af880a6f3b
Merge branch 'master' into max-bytes-to-read-in-schema-inference 2023-06-06 14:47:58 +02:00
Nikita Mikhaylov
e87348010d
Rework loading and removing of data parts for MergeTree tables. (#49474)
Co-authored-by: Sergei Trifonov <sergei@clickhouse.com>
2023-06-06 14:42:56 +02:00
avogar
33e51d4f3b Add setting to limit the number of bytes to read in schema inference 2023-06-05 15:22:04 +00:00
Kruglov Pavel
59f27014f7
Merge pull request #50474 from valentinalexeev/patch-1
Additional error information when JSON is too large
2023-06-05 13:31:21 +02:00
Alexey Gerasimchuk
9958731c27
Merge branch 'master' into ADQM-830 2023-06-05 07:46:47 +10:00
Valentin Alexeev
516cda94ee Use in.count() instead of pos 2023-06-02 18:42:35 +02:00
Valentin Alexeev
da4d55cdaf Additional error information when JSON is too large
If a parser fails on a large JSON, then output the last position processed to allow review.
2023-06-02 18:42:35 +02:00
Michael Kolupaev
b51064a508 Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead 2023-06-01 18:48:30 -07:00
avogar
883350d5c2 Fix tests 2023-06-01 14:51:14 +00:00
Kruglov Pavel
8b34a30455
Fix style 2023-05-31 22:14:57 +02:00
Kruglov Pavel
898d1f34db
Merge branch 'master' into better-capnproto-3 2023-05-31 21:44:00 +02:00
avogar
c9626314f7 Better 2023-05-31 19:22:44 +00:00
Kruglov Pavel
2dd4701115
Merge branch 'master' into allow_empty 2023-05-30 16:04:12 +02:00
avogar
ea395e9554 Make better 2023-05-25 15:24:02 +00:00
Alexey Gerasimchuck
75791d7a63 Added input_format_csv_trim_whitespaces parameter 2023-05-25 07:51:32 +00:00
Kruglov Pavel
f76fc5e066 Fix special build 2023-05-24 17:19:04 +00:00
Kruglov Pavel
94ef08977a Fix special build 2023-05-24 17:19:04 +00:00
Kruglov Pavel
5f1ca61d09 Fix special builds 2023-05-24 17:19:04 +00:00
avogar
a89a8b8d50 Fix build 2023-05-24 17:19:04 +00:00
Kruglov Pavel
c2eada7ba7 Fix style 2023-05-24 17:19:04 +00:00
avogar
e66f6272d1 Refactor CapnProto format to improve input/output performance 2023-05-24 17:19:04 +00:00
Michael Kolupaev
6fd5d8e8ba Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files 2023-05-19 18:39:50 +00:00
Yakov Olkhovskiy
0a44a69dc8 remove unnecessary header 2023-05-17 00:22:13 +00:00
Yakov Olkhovskiy
282297b677 binary encoding of IPv6 in protobuf 2023-05-16 23:46:01 +00:00
Kruglov Pavel
5ada385502
Merge branch 'master' into allow_empty 2023-05-16 12:21:31 +02:00
Kruglov Pavel
558eda4146
Merge pull request #49412 from azat/block-use-dense-hash-map
Switch Block::NameMap to google::dense_hash_map over HashMap
2023-05-15 12:22:55 +02:00
Alexey Milovidov
0ca36d4f89 Merge branch 'master' into clang-17 2023-05-14 01:57:40 +02:00
Alexey Milovidov
5a44dc26e7 Fixes for clang-17 2023-05-13 02:57:31 +02:00
Alexey Milovidov
f6144ee32b
Revert "Make Pretty formats even prettier." 2023-05-13 02:45:07 +03:00
Azat Khuzhin
2c40dd6a4c Switch Block::NameMap to google::dense_hash_map over HashMap
Since HashMap creates 2^8 elements by default, while dense_hash_map
should be good here.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-12 05:52:57 +02:00
Alexey Milovidov
ef16077c72
Merge branch 'master' into pretty-time-squashing 2023-05-06 18:20:49 +03:00
Alexey Milovidov
90b0de5677 Make Pretty prettier 2023-05-05 06:36:53 +02:00
Michael Kolupaev
3bd1489f18 Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading() 2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0 Better control over Parquet row group size 2023-05-04 14:59:55 -07:00
Nikita Mikhaylov
954e3b724c
Speedup outdated parts loading (#49317) 2023-05-03 18:56:45 +02:00
Kruglov Pavel
bacba6e347
Fix typo 2023-04-26 12:18:12 +02:00
Alexey Milovidov
54d10f87f2 Consistency of the LineAsString format 2023-04-23 05:50:46 +02:00
robot-ch-test-poll1
f466c89621
Merge pull request #48911 from Avogar/parquet-metadata-format
Add ParquetMetadata input format to read Parquet file metadata
2023-04-21 03:46:26 +02:00
avogar
34cc7b635a Fix type name 2023-04-19 10:33:39 +00:00
avogar
8af9cf67fd Fix comments 2023-04-19 10:33:39 +00:00
avogar
c2f18281c8 Make better 2023-04-19 10:33:39 +00:00
avogar
bb6cf5252f Fix logical error with IPv4 in Protobuf, add support for Date32 2023-04-19 10:33:39 +00:00
Kruglov Pavel
9bc95bed85
Merge pull request #48898 from Avogar/pretty-json
Add PrettyJSONEachRow format to output pretty JSON
2023-04-19 12:27:24 +02:00
Kruglov Pavel
a5c52d3bc3
Merge branch 'master' into parquet-metadata-format 2023-04-18 21:51:14 +02:00
avogar
b277a5c943 Add ParquetMetadata input format to read Parquet file metadata 2023-04-18 16:46:26 +00:00
avogar
e356f92b77 Add PrettyJSONEachRow format to output pretty JSON 2023-04-18 13:28:59 +00:00
Michael Kolupaev
87be78e6de Better 2023-04-17 04:58:32 +00:00
Michael Kolupaev
e133633359 Parallel decoding with one row group per thread 2023-04-17 04:58:32 +00:00
Michael Kolupaev
683077890f Highly questionable refactoring (getInputMultistream() nonsense) 2023-04-17 04:58:32 +00:00
Michael Kolupaev
2d4fe85513 Something 2023-04-17 04:58:32 +00:00
Kruglov Pavel
f087f0e877
Update src/Formats/ReadSchemaUtils.cpp 2023-04-11 14:18:16 +02:00
robot-ch-test-poll2
bf003c7595
Merge pull request #48390 from Avogar/protobuf-tuple
Allow write/read unnamed tuple as nested Message in Protobuf format
2023-04-05 22:14:28 +02:00
Kruglov Pavel
bd318950b3
Fix special build 2023-04-05 13:35:12 +02:00
Kruglov Pavel
96a3307bda
Merge branch 'master' into fix-protobuf-abort 2023-04-05 11:57:18 +02:00
avogar
f46f098c78 Better 2023-04-05 09:55:49 +00:00
avogar
04be32216a Allow write/read unnamed tuple as nested Message in Protobuf format 2023-04-04 14:47:37 +00:00
avogar
4894f47d95 Fix tests 2023-04-04 13:34:02 +00:00
avogar
972c680b3c Fix typo 2023-04-03 16:27:09 +00:00
avogar
2cde63a25c Avoid abort in protobuf library in debug build 2023-04-03 16:25:22 +00:00
laimuxi
b869572a54 reformat code 2023-04-01 15:20:26 +08:00
laimuxi
3b756ef026 rollback 2023-03-31 21:58:20 +08:00
laimuxi
17efdbf625 change 2023-03-31 21:56:35 +08:00
avogar
35937adcaa Support more types in CapnProto format 2023-03-30 19:15:28 +00:00
Alexey Milovidov
637f6fdd51 Limit memory in fuzzers 2023-03-19 06:17:55 +01:00
Alexey Milovidov
465a89ba15 Limit memory in fuzzers 2023-03-19 05:55:53 +01:00
Alexey Milovidov
57a5a946c9 Fix error 2023-03-19 05:34:10 +01:00
Alexey Milovidov
ee98b555fb Limit memory in fuzzers 2023-03-19 05:11:32 +01:00
Alexey Milovidov
2a077f11f6 Merge branch 'master' into fuzzer-of-data-formats 2023-03-19 01:07:31 +01:00
Alexey Milovidov
2bffed06de Fix style 2023-03-17 18:35:19 +01:00
Alexey Milovidov
1abe5ea58e Add data type fuzzer 2023-03-17 04:44:14 +01:00
Alexey Milovidov
6275c472a7 Better exceptions 2023-03-17 03:14:49 +01:00
avogar
2cc47b5bb6 Allow reading/writing nested arrays in Protobuf with only root field name as column name 2023-03-16 14:43:37 +00:00
Alexey Milovidov
bb6b775884 Merge branch 'master' into fuzzer-of-data-formats 2023-03-15 12:42:00 +01:00
Alexey Milovidov
e443c4e682
Merge pull request #47538 from Avogar/proper-parquet-fix
Proper fix for bug in parquet, revert reverted #45878
2023-03-14 22:29:39 +03:00
Michael Kolupaev
d3a514d221 Compress marks in memory 2023-03-13 16:29:00 -07:00
Alexey Milovidov
f331b9b398 Fix errors and add tests 2023-03-13 23:49:28 +01:00
Alexey Milovidov
14647525f8 Merge branch 'fix-bson-bug' of github.com:Avogar/ClickHouse into fuzzer-of-data-formats 2023-03-13 22:45:00 +01:00
avogar
4213ec609f Proper fix for bug in parquet, revert reverted #45878 2023-03-13 18:22:09 +00:00
Alexey Milovidov
1fd24c212b Update comment 2023-03-13 07:42:58 +01:00
Alexey Milovidov
02f7ef4723 Update comment 2023-03-13 05:28:06 +01:00
Alexey Milovidov
43b938d303 Update the fuzzer 2023-03-13 05:21:48 +01:00
Alexey Milovidov
f33b651686 Add fuzzer for data formats 2023-03-13 04:51:50 +01:00
avogar
5a18acde90 Revert #45878 and add a test 2023-03-11 21:15:14 +00:00
Kruglov Pavel
f387e6013a
Merge pull request #46990 from Avogar/native-types-conversions
Allow types conversion in Native input format
2023-03-10 16:55:16 +01:00
Alexey Milovidov
6f35d46ac8
Update SchemaInferenceUtils.cpp 2023-03-10 05:01:06 +03:00
avogar
46979e383f Fix big numbers inference in CSV 2023-03-09 18:21:47 +00:00
Kruglov Pavel
fe973f3d6f
Merge branch 'master' into native-types-conversions 2023-03-09 13:03:25 +01:00
Kruglov Pavel
71b6d6c6ae
Merge pull request #47114 from Avogar/parquet-compression
Improve working with compression methods in Parquet/ORC/Arrow formats
2023-03-09 13:02:18 +01:00
Mike Kot
9920a52c51 use std::lerp, constexpr hex.h 2023-03-07 22:50:17 +00:00
Kruglov Pavel
69a1309ade
Merge branch 'master' into native-types-conversions 2023-03-07 20:06:17 +01:00
Kruglov Pavel
479cd9b90b
Merge pull request #46972 from Avogar/json-date-int-inference
Fix date and int inference from string in JSON
2023-03-06 20:40:38 +01:00
Kruglov Pavel
3de905bb7c
Merge pull request #46616 from Avogar/fix-ipv4-ipv6-formats
Fix IPv4/IPv6 serialization/deserialization in binary formats
2023-03-06 19:40:29 +01:00
avogar
5ab5902f38 Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats 2023-03-01 21:27:46 +00:00
Kruglov Pavel
65f06fc9b1
Merge branch 'master' into json-date-int-inference 2023-02-28 14:31:57 +01:00
avogar
ab899bf2f3 Allow types conversion in Native input format 2023-02-27 19:28:19 +00:00
avogar
2e921e3d6b Fix date and int inference from string in JSON 2023-02-27 16:00:19 +00:00
Kruglov Pavel
443dedddca
Merge branch 'master' into use-parquet-2 2023-02-27 14:31:43 +01:00
Kruglov Pavel
47f9ca2166
Merge branch 'master' into fix-ipv4-ipv6-formats 2023-02-23 20:32:43 +01:00
avogar
eec6051a50 style 2023-02-23 16:16:08 +00:00
avogar
54622566df Add setting to change parquet version 2023-02-23 16:14:10 +00:00
Kruglov Pavel
ef0d6becba
Merge branch 'master' into null-as-default-all-formats 2023-02-21 16:52:39 +01:00
Kruglov Pavel
b0424c1021
Merge pull request #46171 from Avogar/insert-null-as-default
Use default of column type in `insert_null_as_default` if column DEFAULT values is not specified
2023-02-20 21:45:02 +01:00
Kruglov Pavel
9866ecfe8b
Merge branch 'master' into null-as-default-all-formats 2023-02-20 20:49:30 +01:00
avogar
8da3594cd8 Fix IPv4/IPv6 serialization/deserialization in binary formats 2023-02-20 17:42:56 +00:00
Alexey Milovidov
d8cda3dbb8 Remove PVS-Studio 2023-02-19 23:30:05 +01:00
Kruglov Pavel
9fd2226c4c
Update NativeReader.h 2023-02-15 15:13:04 +01:00
Geoff Genz
be8bf3a6a3
Merge branch 'master' into http_client_version 2023-02-13 08:43:59 -07:00
avogar
d1efd02480 Extend setting input_format_null_as_default for more formats 2023-02-10 16:41:09 +00:00
Geoff Genz
99c3ff53c5 Merge remote-tracking branch 'origin/master' into http_client_version
# Conflicts:
#	src/Interpreters/Context.cpp
#	src/Interpreters/Context.h
2023-02-10 04:35:53 -07:00
Geoff Genz
7ed8ed0284 Add support for client_protocol_version sent with HTTP 2023-02-10 03:47:06 -07:00
avogar
c3e8dd8984 Fix low cardinality case 2023-02-08 19:14:28 +00:00
Kruglov Pavel
4e2918cee3
Merge branch 'master' into parquet-fixed-binary 2023-02-08 12:31:13 +01:00
Antonio Andelic
a39e4e24c6
Merge branch 'master' into optimize_parquet_reader 2023-02-02 14:18:00 +01:00
Vladimir C
7c6281c446
Merge pull request #45581 from Avogar/fix-date-inference 2023-02-01 13:04:12 +01:00
liuneng
17fc22a21e add parquet max_block_size setting 2023-02-01 18:29:20 +08:00
Alexey Milovidov
04078dbed3 Remove trash 2023-01-29 22:43:36 +01:00
Kruglov Pavel
96700abbe1
Merge pull request #45678 from azat/formats/json-parse-tupels
Add ability to ignore unknown keys in JSON object for named tuples
2023-01-27 21:11:05 +01:00
Azat Khuzhin
1a8437f2c9 Add ability to ignore unknown keys in JSON object for named tuples
This can be useful in case your input JSON is complex, while you need
only few fields in it.

This behaviour is controlled by the
input_format_json_ignore_unknown_keys_in_named_tuple setting name, that
is turned OFF by default.

This will, almost, allow to parse gharchive dataset without jq. "almost"
because of two things:
- Tuple cannot be Nullable, so such keys with Tuple type in ClickHouse
  cannot be `null` in JSON
- You cannot use dot.dot notation to extract columns for file() engine,
  only tupleElement()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-27 10:01:08 +01:00
Alexander Tokmakov
d1baa7300c reformat ParsingException 2023-01-24 23:21:29 +01:00
Alexander Tokmakov
dd57215934 Merge branch 'master' into exception_message_patterns4 2023-01-24 17:03:12 +01:00
Kruglov Pavel
23c12ac8ee
Merge branch 'master' into parquet-fixed-binary 2023-01-24 16:51:05 +01:00
avogar
7eeb2a0bc7 Change comment 2023-01-24 15:46:32 +00:00
avogar
159f49266e Don't infer Dates from 8 digit numbers 2023-01-24 15:45:27 +00:00
Kruglov Pavel
cd1cd904a7
Merge branch 'master' into tsv-csv-detect-header 2023-01-23 23:49:56 +01:00
Alexander Tokmakov
3f6594f4c6 forbid old ctor of Exception 2023-01-23 22:18:05 +01:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
Kruglov Pavel
9820beae68
Apply suggestions from code review
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2023-01-19 16:11:13 +01:00
avogar
5bf4704e7a Support FixedSizeBinary type in Parquet/Arrow 2023-01-16 21:01:31 +00:00
Kruglov Pavel
e9d6590926
Merge branch 'master' into tsv-csv-detect-header 2023-01-16 17:50:24 +01:00
avogar
87b934c472 Insert default values in case of missing tuple elements in JSONEachRow 2023-01-12 16:36:44 +00:00
avogar
b461935374 Better 2023-01-12 13:11:04 +00:00
Kruglov Pavel
05a11ff4a4
Merge branch 'master' into tsv-csv-detect-header 2023-01-12 12:35:18 +01:00
avogar
26cd56d113 Fix tests, make better 2023-01-11 22:52:15 +00:00
Kruglov Pavel
50eb9fca67
Merge pull request #44696 from Avogar/schema-inference-uint
Infer UInt64 in case of Int64 overflow
2023-01-11 14:24:42 +01:00
Alexey Milovidov
0d39d26a34 Don't fix parallel formatting 2023-01-09 06:15:20 +01:00
Anton Popov
1f32ffedf8
Merge pull request #43221 from ClickHouse/refactoring-ip-types
Replace domain IP types (IPv4, IPv6) with native
2023-01-07 12:01:21 +01:00
avogar
ee72799121 Fix tests, make better 2023-01-06 20:46:43 +00:00
Anton Popov
b25f875674
Merge pull request #44875 from ClickHouse/fix-another-one-cannot-read-all-data-for-lc-dict-error
Fix right offset for reading LowCardinality dictionary from remote fs
2023-01-06 15:24:36 +01:00
avogar
7fcdb08ec6 Detect header in CSV/TSV/CustomSeparated files automatically 2023-01-05 22:57:25 +00:00
Yakov Olkhovskiy
7a5a36cbed
Merge branch 'master' into refactoring-ip-types 2023-01-04 11:11:06 -05:00
avogar
1f3d75cbf2 Better 2023-01-04 14:58:17 +00:00
Kruglov Pavel
7062054d60
Merge branch 'master' into schema-inference-uint 2023-01-04 14:50:01 +01:00
Nikolai Kochetov
da26f62a9b Fix right offset for reading LowCardinality dictionary from remote fs in case if right mark was in the middle of compressed block. 2023-01-03 18:19:51 +00:00
Alexey Milovidov
e855d3519a
Merge branch 'master' into refactoring-ip-types 2023-01-02 21:58:53 +03:00
avogar
73fecae5ff Fix comments 2023-01-02 15:31:07 +00:00
Kruglov Pavel
0a43976977
Merge branch 'master' into validate-types 2023-01-02 16:10:14 +01:00
Kruglov Pavel
69b9842bc6
Merge branch 'master' into schema-inference-uint 2022-12-30 18:16:00 +01:00
Kruglov Pavel
4982d132fb
Merge branch 'master' into validate-types 2022-12-30 17:52:13 +01:00
Kruglov Pavel
894726bd8f
Merge branch 'master' into improve-streaming-engines 2022-12-29 22:59:45 +01:00
Kruglov Pavel
150a699dda
Merge pull request #44546 from Avogar/better-object-as-string-inference
Improve json object as string inference
2022-12-29 21:58:46 +01:00
avogar
1ce69371fb Infer UInt64 in case of Int64 overflow 2022-12-28 21:46:08 +00:00
Raúl Marín
5de11979ce
Unify query elapsed time measurements (#43455)
* Unify query elapsed time reporting

* add-test: Make shell tests executable

* Add some tests around query elapsed time

* Style and ubsan
2022-12-28 21:01:41 +01:00
avogar
411f98306a Merge branch 'master' of github.com:ClickHouse/ClickHouse into validate-types 2022-12-27 19:24:15 +00:00
Kruglov Pavel
819e7a3008
Merge pull request #44550 from Avogar/better-json-tuples-to-arrays-inference
Improve inferring arrays with nulls in JSON formats
2022-12-27 18:22:13 +01:00
Kruglov Pavel
ac162a2c49
Merge pull request #44522 from Avogar/zero-numbers
Infer numbers starting from zero as strings in TSV
2022-12-27 17:00:10 +01:00
avogar
798c3111ed Improve inferring arrays with nulls in JSON formats 2022-12-24 00:21:48 +00:00
avogar
331f4bfee1 Fix 2022-12-23 19:58:50 +00:00
avogar
f15bf1839a Add missed settings into additional cache info 2022-12-23 19:52:54 +00:00
avogar
8dfe90a6c1 Improve json object as string inference 2022-12-23 19:44:13 +00:00