avogar
a87a8e91cf
Slightly better inference of unnamed tupes in JSON formats
2023-12-11 14:46:12 +00:00
avogar
ee7af95bc0
Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union
2023-12-08 20:29:28 +00:00
Kruglov Pavel
c6fecfb1af
Merge pull request #56901 from KevinyhZou/Fix_allow_cr_end_of_csv_line
...
Fix allow cr end of line for csv
2023-11-29 20:57:58 +01:00
avogar
b493ce2385
Better JSON -> JSONEachRow fallback without catching exceptions
2023-11-29 14:19:38 +00:00
János Benjamin Antal
ab935e3dd7
Use the google proto files when importing protobuf schemas
2023-11-22 12:39:41 +00:00
avogar
7e392eec50
Better exception messages in input formats
2023-11-21 13:13:42 +00:00
kevinyhzou
3adc8fdf78
Fix ci
2023-11-21 11:22:12 +08:00
avogar
ffa90628f0
Make input format errors logger a bit better
2023-11-20 17:22:49 +00:00
avogar
081fa9f3de
Address comments
2023-11-20 15:53:28 +00:00
avogar
872556a5d4
Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-union
2023-11-20 14:03:36 +00:00
avogar
6366819f12
Fix generating deep nested columns in CapnProto/Protobuf schemas
2023-11-17 16:52:20 +00:00
yariks5s
181231d500
init
2023-11-07 17:56:02 +00:00
kevinyhzou
2a50daf5dd
Allow cr at end of csv line
2023-11-06 12:21:42 +08:00
kevinyhzou
ef30e6723d
bug fix csv read while end of line is not crlf
2023-11-06 12:21:42 +08:00
Kruglov Pavel
754ab9fa6c
Merge pull request #55974 from Avogar/fix-protobuf-auto-schema
...
Fix autogenerated Protobuf schema with fields with underscore
2023-11-01 18:17:09 +01:00
Kruglov Pavel
bf77ce691c
Merge pull request #55982 from yariks5s/npy_input_format
...
New input format Npy
2023-11-01 14:26:22 +01:00
yariks5s
6c4bf59021
fix suggestions and enhance tests
2023-10-31 18:10:55 +00:00
yariks5s
9a2d89e3e4
removed getSize() and enhanced docs
2023-10-30 12:42:19 +00:00
yariks5s
e14a7f066a
fix typos
2023-10-28 01:46:59 +00:00
yariks5s
894724bfb3
suggested changes
2023-10-28 01:17:25 +00:00
yariks5s
23635352f1
fixed due to review
2023-10-27 15:43:03 +00:00
Kruglov Pavel
bb4b95e891
Merge branch 'master' into schema-inference-union
2023-10-27 14:53:58 +02:00
Kruglov Pavel
570b66f027
Merge branch 'master' into schema-inference-union
2023-10-26 19:26:00 +02:00
zvonand
0766c73aab
Rename date_time_overflow_mode -> date_time_overflow_behavior, moved it to format settings
2023-10-25 23:11:13 +02:00
李扬
465962df7f
Support orc filter push down (file + stripe + rowgroup level) ( #55330 )
...
* support orc filter push down
* update orc lib version
* replace setqueryinfo with setkeycondition
* fix issue https://github.com/ClickHouse/ClickHouse/issues/53536
* refactor source with key condition
* fix building error
* remove std::cout
* update orc
* update orc version
* fix bugs
* improve code
* upgrade orc lib
* fix code style
* change as requested
* add performance tests for orc filter push down
* add performance tests for orc filter push down
* fix all bugs
* fix default as null issue
* add uts for null as default issues
* upgrade orc lib
* fix failed orc lib uts and fix typo
* fix failed uts
* fix failed uts
* fix ast fuzzer tests
* fix bug of uint64 overflow in https://s3.amazonaws.com/clickhouse-test-reports/55330/de22fdcaea2e12c96f300e95f59beba84401712d/fuzzer_astfuzzerubsan/report.html
* fix asan fatal caused by reused column vector batch in native orc input format. refer to https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__asan__[4_4].htm
* fix wrong performance tests
* disable 02892_orc_filter_pushdown on aarch64. https://s3.amazonaws.com/clickhouse-test-reports/55330/be39d23af2d7e27f5ec7f168947cf75aeaabf674/stateless_tests__aarch64_.html
* add some comments
* add some comments
* inline range::equals and range::less
* fix data race of key condition
* trigger ci
2023-10-24 12:08:17 -07:00
avogar
b2c72f95b2
Fix autogenerated Protobuf schema with fields with underscore
2023-10-24 13:08:06 +00:00
taiyang-li
9c186d18a8
retrigger ci
2023-10-24 16:13:53 +08:00
taiyang-li
a02c49e16f
allow skip null values when serailize tuple to json objects
2023-10-24 11:47:46 +08:00
avogar
544b217d91
Fix style
2023-10-20 21:05:26 +00:00
Kruglov Pavel
6f61ccfe28
Merge branch 'master' into schema-inference-union
2023-10-20 22:54:11 +02:00
avogar
6934e27e8b
Add union mode for schema inference to infer union schema of files with different schemas
2023-10-20 20:46:41 +00:00
Raúl Marín
e500dc22e4
Respect default format when using http_write_exception_in_output_format
2023-10-17 14:14:58 +02:00
Michael Kolupaev
ce7eca0615
DWARF input format ( #55450 )
...
* Add ReadBufferFromFileBase::isRegularLocalFile()
* DWARF input format
* Review comments
* Changed things around ENABLE_EMBEDDED_COMPILER build setting
* Added 'ranges' column
* no-msan no-ubsan
2023-10-16 17:00:07 -07:00
yariks5s
9ae025d7e6
mid commit
2023-10-12 17:37:59 +00:00
Azat Khuzhin
2cbb069b68
Add ability to ignore data after semicolon in Values format
...
This is required for client, to handle comments in multiquery mode.
v0: separate context for input format
v2: cannot use separate context since params and stuff are changed in global context
v3: do not sent this setting to the server (breaks queries for readonly profiles)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-10-12 14:55:26 +02:00
Robert Schulze
9d04d3c3ad
Merge remote-tracking branch 'rschu1ze/master' into better-use-mysql-types-in-show-columns
2023-10-11 15:04:03 +00:00
Robert Schulze
bd43b84bf8
Make use_mysql_types_in_show_columns affect only SHOW COLUMNS
2023-10-10 23:09:49 +00:00
Kruglov Pavel
5ded0005a3
Merge pull request #55064 from AVMusorin/system-drop-format-cache
...
Allow drop cache for Protobuf format
2023-10-10 14:23:11 +02:00
Aleksandr Musorin
8d0c961af0
Allow drop cache for protobuf format
...
Before it was impossible to update Protobuf schema without server
restart. With this commit, it is enough to send query `SYSTEM DROP
SCHEMA FORMAT CACHE [FOR Protobuf]`.
2023-10-09 10:41:15 +02:00
avogar
c68e008f87
Apply suggestions
2023-09-27 11:18:39 +00:00
avogar
6b3dbc4403
Apply suggestions
2023-09-26 16:41:35 +00:00
Kruglov Pavel
bea80ab5b7
Merge branch 'master' into json-object-as-tuple-inference
2023-09-26 15:23:08 +02:00
Kruglov Pavel
69a17bbef6
Merge pull request #52853 from Avogar/http-valid-json-on-exception
...
Output valid JSON/XML on excetpion during HTTP query execution
2023-09-26 14:25:55 +02:00
Kruglov Pavel
b6863a9f52
Fix comments
2023-09-26 14:13:34 +02:00
avogar
cabb3ddaae
Fix tests
2023-09-25 21:45:11 +00:00
avogar
95d50fd7de
Fix tests
2023-09-25 18:47:33 +00:00
avogar
4d4e3db84a
Fix style and build
2023-09-25 17:49:56 +00:00
avogar
9e75825515
Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-object-as-tuple-inference
2023-09-25 17:24:36 +00:00
avogar
33a896ee6c
Small fixes
2023-09-25 15:52:17 +00:00
avogar
42ca897f2d
Better schema inference for JSON formats
2023-09-25 15:42:59 +00:00
robot-ch-test-poll2
37f732f622
Merge pull request #54808 from ClickHouse/eof
...
Prevent parquet schema inference reading the first 1 MB of the file unnecessarily
2023-09-20 09:49:42 +02:00
Michael Kolupaev
7271cfd187
Prevent parquet schema inference reading the first 1 MB of the file unnecessarily
2023-09-19 21:58:12 +00:00
avogar
8c29408f5e
Parse data in JSON format as JSONEachRow if failed to parse metadata
2023-09-19 11:53:40 +00:00
avogar
2bd747dbe4
Fix tests
2023-09-15 15:26:26 +00:00
Kruglov Pavel
2c407ab3c0
Merge branch 'master' into json-object-as-tuple-inference
2023-09-15 16:29:48 +02:00
Kruglov Pavel
dbd24b240c
Merge branch 'master' into http-valid-json-on-exception
2023-09-15 14:55:31 +02:00
Kruglov Pavel
6419f91cfc
Merge pull request #54585 from ClickHouse/Avogar-patch-1
...
Remove output_format_markdown_escape_special_characters from settings changes history
2023-09-14 15:45:05 +02:00
slvrtrn
c0961d9378
Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql
2023-09-13 19:33:11 +02:00
avogar
1480c8ad30
Place setting into separate struct
2023-09-13 13:19:05 +00:00
slvrtrn
dddea9219a
Address the review comments
2023-09-12 18:39:03 +02:00
robot-clickhouse
1c8ee76ba2
Merge pull request #54513 from Avogar/formats-with-names-no-header
...
Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header
2023-09-12 17:58:03 +02:00
slvrtrn
611a75a87f
Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql
2023-09-12 10:38:44 +02:00
avogar
803d8dcf85
Support NULL as default for nested types Array/Tuple/Map for input formats
2023-09-11 18:18:33 +00:00
avogar
18a8e58802
Enable setting by default, change logic for object/string/map inference from objects
2023-09-11 18:15:56 +00:00
avogar
b5cccc5f8d
Remove unused field
2023-09-11 14:58:02 +00:00
avogar
2d8f33bfa2
Fix parsing error in WithNames formats while reading subset of columns with disabled input_format_with_names_use_header
2023-09-11 14:55:37 +00:00
avogar
ba307c7466
Allow to infer named Tuples from JSON objects under a setting in JSON formats
2023-09-07 19:41:19 +00:00
Kruglov Pavel
de801ce563
Merge pull request #54293 from ClickHouse/pasha-returned-from-vacation
...
Code improvement for reading from archives
2023-09-05 21:09:09 +02:00
Robert Schulze
f4c86fe34e
Merge pull request #53860 from irenjj/feat_markdown
...
Add a setting to escape special characters in Markdown.
2023-09-05 15:44:11 +02:00
Antonio Andelic
88930a335c
Apply comments
2023-09-05 12:32:07 +00:00
robot-clickhouse
926c5636dd
Merge pull request #42599 from ClickHouse/build-fuzzer-protocol
...
libFuzzer: add CI fuzzers build, add tcp protocol fuzzer, fix other fuzzers.
2023-09-04 22:41:54 +02:00
slvrtrn
bb0eff9669
Revert format changes
2023-09-04 21:15:26 +02:00
Yakov Olkhovskiy
361b21b416
fix fuzzers, cmake refactor, add target fuzzers
2023-09-01 14:20:50 +00:00
Robert Schulze
aefb543734
Merge remote-tracking branch 'rschu1ze/master' into feat_markdown
2023-08-31 11:32:44 +00:00
slvrtrn
8378441248
Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql
2023-08-30 19:00:51 +02:00
Alexey Milovidov
bbef3ceeb0
Merge pull request #53902 from HarryLeeIBM/hlee-s390x-stripe-log
...
Fix StripeLog storage endian issue for s390x
2023-08-29 00:35:48 +03:00
HarryLeeIBM
dcecf52a68
Fix StripeLog storage endian issue for s390x
2023-08-28 11:35:04 -07:00
irenjj
51aa89eed8
Add a setting to automatically escape special characters in Markdown.
2023-08-28 00:10:33 +08:00
slvrtrn
055d2e3c3d
Merge remote-tracking branch 'origin' into simplified-prepared-statements-for-mysql
2023-08-25 21:27:47 +02:00
slvrtrn
734ffd916c
WIP prepared statements
2023-08-25 20:31:21 +02:00
Yakov Olkhovskiy
415a993c91
fix fuzzers build
2023-08-24 23:22:39 +00:00
Kruglov Pavel
f7e1abd774
Merge branch 'master' into cache-count
2023-08-23 22:31:49 +02:00
Kruglov Pavel
67c5c0203b
Merge branch 'master' into fast-count-from-files
2023-08-22 15:03:48 +02:00
Kruglov Pavel
c0bdd0e00b
Merge branch 'master' into cache-count
2023-08-22 14:42:22 +02:00
avogar
b4145aeddc
Cache number of rows in files for count in file/s3/url/hdfs/azure functions
2023-08-22 11:59:59 +00:00
Michael Kolupaev
2f4d433e69
Parquet filter pushdown
2023-08-21 14:15:52 -07:00
Michael Kolupaev
6009e1b293
Merge pull request #53324 from bigo-sg/ch_gluten_2583
...
Implement native orc input format without arrow to improve performance
2023-08-21 13:44:57 -07:00
Kruglov Pavel
88aee95122
Merge branch 'master' into fast-count-from-files
2023-08-21 14:46:33 +02:00
avogar
47304bf7aa
Optimize count from files in most input formats
2023-08-21 12:30:52 +00:00
Kruglov Pavel
c68456a20a
Merge pull request #52692 from Avogar/variable-number-of-volumns-more-formats
...
Allow variable number of columns in more formats, make it work with schema inference
2023-08-21 13:28:35 +02:00
taiyang-li
f723e8d43a
change as request
2023-08-21 12:09:02 +08:00
Michael Kolupaev
a1522e22ea
Merge pull request #53281 from Avogar/batch-small-parquet-row-groups
...
Optimize reading small row groups by batching them together in Parquet
2023-08-18 17:15:42 -07:00
avogar
bca91548ad
Add setting input_format_parquet_local_file_min_bytes_for_seek
2023-08-17 12:28:01 +00:00
Alexander Tokmakov
ba44d7260e
fix
2023-08-16 00:20:28 +02:00
avogar
7e863a2726
Address comments
2023-08-11 13:17:49 +00:00
avogar
3ad7e57059
Optimize reading small row groups by batching them together in Parquet
2023-08-11 13:17:45 +00:00
Kruglov Pavel
b6b0e9c6bc
Merge branch 'master' into variable-number-of-volumns-more-formats
2023-08-11 14:02:21 +02:00
Kruglov Pavel
00865a7dad
Merge branch 'master' into format-one
2023-08-11 13:54:58 +02:00
Kruglov Pavel
6600f87f86
Merge branch 'master' into http-valid-json-on-exception
2023-08-10 13:53:32 +02:00
Kruglov Pavel
33a39900ad
Merge branch 'master' into variable-number-of-volumns-more-formats
2023-08-09 19:51:17 +02:00
avogar
01a7c7560f
Add input format One
2023-08-09 11:25:32 +00:00
Alexey Milovidov
29221188ba
Fix error
2023-08-09 04:07:31 +02:00
Alexey Milovidov
5dd99db369
Add diagnostic info about file name during schema inference
2023-08-08 03:55:06 +02:00
Anton Popov
ff137773e7
Merge branch 'master' into formats-with-subcolumns
2023-08-02 15:24:56 +02:00
Alexey Milovidov
3af4fd4003
Merge branch 'master' into change-protocol-version
2023-08-02 15:44:08 +03:00
Anton Popov
aec0667f16
better check of version for sparse serialization
2023-08-01 23:39:52 +00:00
avogar
fa905ebd27
Clean up
2023-08-01 10:14:09 +00:00
avogar
a71cd56a90
Output valid JSON/XML on excetpion during HTTP query execution
2023-08-01 10:06:56 +00:00
Anton Popov
525da38316
increase min protocol version for sparse serialization
2023-07-31 18:58:58 +00:00
Kruglov Pavel
3e1c409e60
Merge branch 'master' into structure-to-schema
2023-07-28 11:32:16 +02:00
avogar
6d77d52dfe
Allow variable number of columns in TSV/CuatomSeprarated/JSONCompactEachRow, make schema inference work with variable number of columns
2023-07-27 18:02:29 +00:00
Kruglov Pavel
0cd2d7449b
Use for-range loop, add comment
2023-07-27 00:01:25 +02:00
Alexey Milovidov
bc86c26e4e
Update src/Formats/StructureToFormatSchemaUtils.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2023-07-26 23:37:20 +03:00
Kruglov Pavel
0d34e97dbe
Merge branch 'master' into formats-with-subcolumns
2023-07-26 13:30:35 +02:00
Michael Kolupaev
8184a289e5
Partially reimplement Parquet encoder to make it faster and parallelizable
2023-07-25 10:16:28 +00:00
Igor Nikonov
2d33661594
Merge branch 'master' into fix-potentially-bad-code
2023-07-22 22:48:07 +02:00
avogar
da6a31bb62
Fix tests and style
2023-07-19 13:26:09 +00:00
Kruglov Pavel
f0026af189
Revert "Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed""
2023-07-19 14:51:11 +02:00
Kruglov Pavel
7b3564f96a
Revert "Improve CSVInputFormat to check and set default value to column if deserialize failed"
2023-07-19 14:44:59 +02:00
robot-ch-test-poll4
63d0616a22
Merge pull request #51716 from KevinyhZou/bug_fix_csv_field_type_not_match
...
Improve CSVInputFormat to check and set default value to column if deserialize failed
2023-07-19 14:41:05 +02:00
kevinyhzou
95424177d5
review fix
2023-07-19 18:26:54 +08:00
Alexey Milovidov
8cd2e7c7d6
Merge branch 'master' into fix-potentially-bad-code
2023-07-18 22:18:22 +02:00
avogar
b300781fd8
Make better, add tests
2023-07-18 17:48:39 +00:00
avogar
67f340b501
Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema
2023-07-18 13:52:15 +00:00
Kruglov Pavel
1dd05319b5
Merge branch 'master' into formats-with-subcolumns
2023-07-17 19:13:42 +02:00
kevinyhzou
355faa4251
ci fix
2023-07-17 20:08:32 +08:00
robot-clickhouse-ci-2
ac3cc1c2ff
Merge pull request #45671 from ClibMouse/feature/interval-kql-style-formatting
...
Implement KQL-style formatting for Interval
2023-07-16 04:06:54 +02:00
kevinyhzou
b2665031dc
review fix
2023-07-13 20:27:14 +08:00
kevinyhzou
ba57c84db3
bug fix csv input field type mismatch
2023-07-13 20:24:10 +08:00
Alexander Gololobov
9757e272b9
Check number of rows in the reader instead
2023-07-11 12:24:16 +02:00
ltrk2
2d2debe3ce
Introduce a separate setting for interval output formatting
2023-07-10 13:51:49 -04:00
ltrk2
b673aa8e6b
Use the dialect configuration
2023-07-10 13:51:49 -04:00
ltrk2
522b9ebf8c
Implement KQL-style formatting for Interval
2023-07-10 13:51:49 -04:00
Dmitry Kardymon
32f5a78302
Fix setting name
2023-07-06 07:32:46 +00:00
Dmitry Kardymon
24b5c9c204
Use one setting input_format_csv_allow_variable_number_of_colums and code in RowInput
2023-07-06 06:05:43 +00:00
avogar
98aa6b317f
Support reading subcolumns from file/s3/hdfs/url/azureBlobStorage table functions
2023-07-04 21:17:26 +00:00
Dmitry Kardymon
ab4142eb8f
Merge remote-tracking branch 'clickhouse/master' into ADQM-870
2023-07-04 08:23:31 +03:00
Alexey Milovidov
27f41869a9
Remove code that I don't like
2023-06-25 09:11:42 +02:00
avogar
03f820bc4a
Merge branch 'master' of github.com:ClickHouse/ClickHouse into structure-to-schema
2023-06-22 18:46:01 +00:00
avogar
4060beae49
Structure to CapnProto/Protobuf schema take 1
2023-06-22 18:00:00 +00:00
Michael Kolupaev
2498170253
Fix use-after-free in StorageURL when switching URLs
2023-06-22 16:24:12 +00:00
Dmitry Kardymon
30bea857fd
Merge remote-tracking branch 'origin/master' into ADQM-870
2023-06-19 07:19:07 +00:00
Kruglov Pavel
11f176dd19
Merge pull request #50712 from KevinyhZou/bug_fix_csv_parse_by_tab_delimiter
...
Support CSVInputFormat to read csv file by whitespace & tab delimiter
2023-06-16 13:16:22 +02:00
Dmitry Kardymon
806176d88e
Add input_format_csv_missing_as_default setting and tests
2023-06-15 11:23:08 +00:00
KevinyhZou
953f40aa3b
Merge branch 'master' into bug_fix_csv_parse_by_tab_delimiter
2023-06-15 10:25:19 +08:00
Dmitry Kardymon
a91fc3ddb3
Add docs/ add more cases in test
2023-06-14 16:44:31 +00:00
Dmitry Kardymon
ed318d1035
Add input_format_csv_ignore_extra_columns setting (prototype)
2023-06-14 10:35:36 +00:00
Chang Chen
86694847c6
using Reader instead of typename CapnpType::Reader
2023-06-14 15:22:32 +08:00
Chang Chen
e281026e00
fix build issue on clang 15
2023-06-14 12:29:55 +08:00
kevinyhzou
f3b99156ac
review fix
2023-06-14 10:48:21 +08:00
Kruglov Pavel
607f337d67
Merge pull request #50592 from Avogar/max-bytes-to-read-in-schema-inference
...
Add setting to limit the number of bytes to read in schema inference
2023-06-13 16:47:57 +02:00
Kruglov Pavel
8fdcd91c38
Merge pull request #49752 from Avogar/better-capnproto-3
...
Refactor CapnProto format to improve input/output performance
2023-06-13 16:20:38 +02:00
Kruglov Pavel
24d70a2afd
Fix
2023-06-12 13:37:59 +02:00
kevinyhzou
911f8ad8dc
use whitespace or tab as field delimiter
2023-06-12 11:57:52 +08:00
avogar
47b0c2a862
Make better
2023-06-09 13:01:36 +00:00
kevinyhzou
48e1b21aab
Add feature to support read csv by space & tab delimiter
2023-06-08 20:34:30 +08:00
avogar
cc036528fe
Merge branch 'master' of github.com:ClickHouse/ClickHouse into better-capnproto-3
2023-06-08 11:16:13 +00:00
Kruglov Pavel
1baa6404e6
Merge branch 'master' into skip-trailing-empty-lines
2023-06-06 19:39:34 +02:00
avogar
df50833b70
Allow to skip trailing empty lines in CSV/TSV/CustomeSeparated formats
2023-06-06 17:33:05 +00:00
Kruglov Pavel
af880a6f3b
Merge branch 'master' into max-bytes-to-read-in-schema-inference
2023-06-06 14:47:58 +02:00
Nikita Mikhaylov
e87348010d
Rework loading and removing of data parts for MergeTree tables. ( #49474 )
...
Co-authored-by: Sergei Trifonov <sergei@clickhouse.com>
2023-06-06 14:42:56 +02:00
avogar
33e51d4f3b
Add setting to limit the number of bytes to read in schema inference
2023-06-05 15:22:04 +00:00
Kruglov Pavel
59f27014f7
Merge pull request #50474 from valentinalexeev/patch-1
...
Additional error information when JSON is too large
2023-06-05 13:31:21 +02:00
Alexey Gerasimchuk
9958731c27
Merge branch 'master' into ADQM-830
2023-06-05 07:46:47 +10:00
Valentin Alexeev
516cda94ee
Use in.count() instead of pos
2023-06-02 18:42:35 +02:00
Valentin Alexeev
da4d55cdaf
Additional error information when JSON is too large
...
If a parser fails on a large JSON, then output the last position processed to allow review.
2023-06-02 18:42:35 +02:00
Michael Kolupaev
b51064a508
Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead
2023-06-01 18:48:30 -07:00
avogar
883350d5c2
Fix tests
2023-06-01 14:51:14 +00:00
Kruglov Pavel
8b34a30455
Fix style
2023-05-31 22:14:57 +02:00
Kruglov Pavel
898d1f34db
Merge branch 'master' into better-capnproto-3
2023-05-31 21:44:00 +02:00
avogar
c9626314f7
Better
2023-05-31 19:22:44 +00:00
Kruglov Pavel
2dd4701115
Merge branch 'master' into allow_empty
2023-05-30 16:04:12 +02:00
avogar
ea395e9554
Make better
2023-05-25 15:24:02 +00:00
Alexey Gerasimchuck
75791d7a63
Added input_format_csv_trim_whitespaces parameter
2023-05-25 07:51:32 +00:00
Kruglov Pavel
f76fc5e066
Fix special build
2023-05-24 17:19:04 +00:00
Kruglov Pavel
94ef08977a
Fix special build
2023-05-24 17:19:04 +00:00
Kruglov Pavel
5f1ca61d09
Fix special builds
2023-05-24 17:19:04 +00:00
avogar
a89a8b8d50
Fix build
2023-05-24 17:19:04 +00:00
Kruglov Pavel
c2eada7ba7
Fix style
2023-05-24 17:19:04 +00:00
avogar
e66f6272d1
Refactor CapnProto format to improve input/output performance
2023-05-24 17:19:04 +00:00
Michael Kolupaev
6fd5d8e8ba
Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files
2023-05-19 18:39:50 +00:00
Yakov Olkhovskiy
0a44a69dc8
remove unnecessary header
2023-05-17 00:22:13 +00:00
Yakov Olkhovskiy
282297b677
binary encoding of IPv6 in protobuf
2023-05-16 23:46:01 +00:00
Kruglov Pavel
5ada385502
Merge branch 'master' into allow_empty
2023-05-16 12:21:31 +02:00
Kruglov Pavel
558eda4146
Merge pull request #49412 from azat/block-use-dense-hash-map
...
Switch Block::NameMap to google::dense_hash_map over HashMap
2023-05-15 12:22:55 +02:00
Alexey Milovidov
0ca36d4f89
Merge branch 'master' into clang-17
2023-05-14 01:57:40 +02:00
Alexey Milovidov
5a44dc26e7
Fixes for clang-17
2023-05-13 02:57:31 +02:00
Alexey Milovidov
f6144ee32b
Revert "Make Pretty
formats even prettier."
2023-05-13 02:45:07 +03:00
Azat Khuzhin
2c40dd6a4c
Switch Block::NameMap to google::dense_hash_map over HashMap
...
Since HashMap creates 2^8 elements by default, while dense_hash_map
should be good here.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-12 05:52:57 +02:00
Alexey Milovidov
ef16077c72
Merge branch 'master' into pretty-time-squashing
2023-05-06 18:20:49 +03:00
Alexey Milovidov
90b0de5677
Make Pretty prettier
2023-05-05 06:36:53 +02:00
Michael Kolupaev
3bd1489f18
Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading()
2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0
Better control over Parquet row group size
2023-05-04 14:59:55 -07:00
Nikita Mikhaylov
954e3b724c
Speedup outdated parts loading ( #49317 )
2023-05-03 18:56:45 +02:00
Kruglov Pavel
bacba6e347
Fix typo
2023-04-26 12:18:12 +02:00
Alexey Milovidov
54d10f87f2
Consistency of the LineAsString format
2023-04-23 05:50:46 +02:00
robot-ch-test-poll1
f466c89621
Merge pull request #48911 from Avogar/parquet-metadata-format
...
Add ParquetMetadata input format to read Parquet file metadata
2023-04-21 03:46:26 +02:00
avogar
34cc7b635a
Fix type name
2023-04-19 10:33:39 +00:00
avogar
8af9cf67fd
Fix comments
2023-04-19 10:33:39 +00:00
avogar
c2f18281c8
Make better
2023-04-19 10:33:39 +00:00
avogar
bb6cf5252f
Fix logical error with IPv4 in Protobuf, add support for Date32
2023-04-19 10:33:39 +00:00
Kruglov Pavel
9bc95bed85
Merge pull request #48898 from Avogar/pretty-json
...
Add PrettyJSONEachRow format to output pretty JSON
2023-04-19 12:27:24 +02:00
Kruglov Pavel
a5c52d3bc3
Merge branch 'master' into parquet-metadata-format
2023-04-18 21:51:14 +02:00
avogar
b277a5c943
Add ParquetMetadata input format to read Parquet file metadata
2023-04-18 16:46:26 +00:00
avogar
e356f92b77
Add PrettyJSONEachRow format to output pretty JSON
2023-04-18 13:28:59 +00:00
Michael Kolupaev
87be78e6de
Better
2023-04-17 04:58:32 +00:00
Michael Kolupaev
e133633359
Parallel decoding with one row group per thread
2023-04-17 04:58:32 +00:00
Michael Kolupaev
683077890f
Highly questionable refactoring (getInputMultistream() nonsense)
2023-04-17 04:58:32 +00:00
Michael Kolupaev
2d4fe85513
Something
2023-04-17 04:58:32 +00:00
Kruglov Pavel
f087f0e877
Update src/Formats/ReadSchemaUtils.cpp
2023-04-11 14:18:16 +02:00
robot-ch-test-poll2
bf003c7595
Merge pull request #48390 from Avogar/protobuf-tuple
...
Allow write/read unnamed tuple as nested Message in Protobuf format
2023-04-05 22:14:28 +02:00
Kruglov Pavel
bd318950b3
Fix special build
2023-04-05 13:35:12 +02:00
Kruglov Pavel
96a3307bda
Merge branch 'master' into fix-protobuf-abort
2023-04-05 11:57:18 +02:00
avogar
f46f098c78
Better
2023-04-05 09:55:49 +00:00
avogar
04be32216a
Allow write/read unnamed tuple as nested Message in Protobuf format
2023-04-04 14:47:37 +00:00
avogar
4894f47d95
Fix tests
2023-04-04 13:34:02 +00:00
avogar
972c680b3c
Fix typo
2023-04-03 16:27:09 +00:00
avogar
2cde63a25c
Avoid abort in protobuf library in debug build
2023-04-03 16:25:22 +00:00
laimuxi
b869572a54
reformat code
2023-04-01 15:20:26 +08:00
laimuxi
3b756ef026
rollback
2023-03-31 21:58:20 +08:00
laimuxi
17efdbf625
change
2023-03-31 21:56:35 +08:00
avogar
35937adcaa
Support more types in CapnProto format
2023-03-30 19:15:28 +00:00
Alexey Milovidov
637f6fdd51
Limit memory in fuzzers
2023-03-19 06:17:55 +01:00
Alexey Milovidov
465a89ba15
Limit memory in fuzzers
2023-03-19 05:55:53 +01:00
Alexey Milovidov
57a5a946c9
Fix error
2023-03-19 05:34:10 +01:00
Alexey Milovidov
ee98b555fb
Limit memory in fuzzers
2023-03-19 05:11:32 +01:00
Alexey Milovidov
2a077f11f6
Merge branch 'master' into fuzzer-of-data-formats
2023-03-19 01:07:31 +01:00
Alexey Milovidov
2bffed06de
Fix style
2023-03-17 18:35:19 +01:00
Alexey Milovidov
1abe5ea58e
Add data type fuzzer
2023-03-17 04:44:14 +01:00
Alexey Milovidov
6275c472a7
Better exceptions
2023-03-17 03:14:49 +01:00
avogar
2cc47b5bb6
Allow reading/writing nested arrays in Protobuf with only root field name as column name
2023-03-16 14:43:37 +00:00
Alexey Milovidov
bb6b775884
Merge branch 'master' into fuzzer-of-data-formats
2023-03-15 12:42:00 +01:00
Alexey Milovidov
e443c4e682
Merge pull request #47538 from Avogar/proper-parquet-fix
...
Proper fix for bug in parquet, revert reverted #45878
2023-03-14 22:29:39 +03:00
Michael Kolupaev
d3a514d221
Compress marks in memory
2023-03-13 16:29:00 -07:00
Alexey Milovidov
f331b9b398
Fix errors and add tests
2023-03-13 23:49:28 +01:00
Alexey Milovidov
14647525f8
Merge branch 'fix-bson-bug' of github.com:Avogar/ClickHouse into fuzzer-of-data-formats
2023-03-13 22:45:00 +01:00
avogar
4213ec609f
Proper fix for bug in parquet, revert reverted #45878
2023-03-13 18:22:09 +00:00
Alexey Milovidov
1fd24c212b
Update comment
2023-03-13 07:42:58 +01:00
Alexey Milovidov
02f7ef4723
Update comment
2023-03-13 05:28:06 +01:00
Alexey Milovidov
43b938d303
Update the fuzzer
2023-03-13 05:21:48 +01:00
Alexey Milovidov
f33b651686
Add fuzzer for data formats
2023-03-13 04:51:50 +01:00
avogar
5a18acde90
Revert #45878 and add a test
2023-03-11 21:15:14 +00:00
Kruglov Pavel
f387e6013a
Merge pull request #46990 from Avogar/native-types-conversions
...
Allow types conversion in Native input format
2023-03-10 16:55:16 +01:00
Alexey Milovidov
6f35d46ac8
Update SchemaInferenceUtils.cpp
2023-03-10 05:01:06 +03:00
avogar
46979e383f
Fix big numbers inference in CSV
2023-03-09 18:21:47 +00:00
Kruglov Pavel
fe973f3d6f
Merge branch 'master' into native-types-conversions
2023-03-09 13:03:25 +01:00
Kruglov Pavel
71b6d6c6ae
Merge pull request #47114 from Avogar/parquet-compression
...
Improve working with compression methods in Parquet/ORC/Arrow formats
2023-03-09 13:02:18 +01:00
Mike Kot
9920a52c51
use std::lerp, constexpr hex.h
2023-03-07 22:50:17 +00:00
Kruglov Pavel
69a1309ade
Merge branch 'master' into native-types-conversions
2023-03-07 20:06:17 +01:00
Kruglov Pavel
479cd9b90b
Merge pull request #46972 from Avogar/json-date-int-inference
...
Fix date and int inference from string in JSON
2023-03-06 20:40:38 +01:00