Nikita Mikhaylov
e87348010d
Rework loading and removing of data parts for MergeTree tables. ( #49474 )
...
Co-authored-by: Sergei Trifonov <sergei@clickhouse.com>
2023-06-06 14:42:56 +02:00
avogar
33e51d4f3b
Add setting to limit the number of bytes to read in schema inference
2023-06-05 15:22:04 +00:00
Alexey Gerasimchuk
9958731c27
Merge branch 'master' into ADQM-830
2023-06-05 07:46:47 +10:00
Michael Kolupaev
b51064a508
Get rid of SeekableReadBufferFactory, add SeekableReadBuffer::readBigAt() instead
2023-06-01 18:48:30 -07:00
Alexey Gerasimchuck
75791d7a63
Added input_format_csv_trim_whitespaces parameter
2023-05-25 07:51:32 +00:00
Michael Kolupaev
6fd5d8e8ba
Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files
2023-05-19 18:39:50 +00:00
Alexey Milovidov
f6144ee32b
Revert "Make Pretty
formats even prettier."
2023-05-13 02:45:07 +03:00
Alexey Milovidov
ef16077c72
Merge branch 'master' into pretty-time-squashing
2023-05-06 18:20:49 +03:00
Alexey Milovidov
90b0de5677
Make Pretty prettier
2023-05-05 06:36:53 +02:00
Michael Kolupaev
3bd1489f18
Propagate input_format_parquet_preserve_order to parallelizeOutputAfterReading()
2023-05-05 04:20:27 +00:00
Michael Kolupaev
eb3b774ad0
Better control over Parquet row group size
2023-05-04 14:59:55 -07:00
Nikita Mikhaylov
954e3b724c
Speedup outdated parts loading ( #49317 )
2023-05-03 18:56:45 +02:00
Michael Kolupaev
87be78e6de
Better
2023-04-17 04:58:32 +00:00
Michael Kolupaev
e133633359
Parallel decoding with one row group per thread
2023-04-17 04:58:32 +00:00
Michael Kolupaev
683077890f
Highly questionable refactoring (getInputMultistream() nonsense)
2023-04-17 04:58:32 +00:00
Michael Kolupaev
2d4fe85513
Something
2023-04-17 04:58:32 +00:00
Alexey Milovidov
bb6b775884
Merge branch 'master' into fuzzer-of-data-formats
2023-03-15 12:42:00 +01:00
Alexey Milovidov
f331b9b398
Fix errors and add tests
2023-03-13 23:49:28 +01:00
Alexey Milovidov
14647525f8
Merge branch 'fix-bson-bug' of github.com:Avogar/ClickHouse into fuzzer-of-data-formats
2023-03-13 22:45:00 +01:00
avogar
4213ec609f
Proper fix for bug in parquet, revert reverted #45878
2023-03-13 18:22:09 +00:00
Alexey Milovidov
f33b651686
Add fuzzer for data formats
2023-03-13 04:51:50 +01:00
avogar
5a18acde90
Revert #45878 and add a test
2023-03-11 21:15:14 +00:00
Kruglov Pavel
fe973f3d6f
Merge branch 'master' into native-types-conversions
2023-03-09 13:03:25 +01:00
Kruglov Pavel
69a1309ade
Merge branch 'master' into native-types-conversions
2023-03-07 20:06:17 +01:00
avogar
5ab5902f38
Allow control compression in Parquet/ORC/Arrow output formats, support more compression for input formats
2023-03-01 21:27:46 +00:00
avogar
ab899bf2f3
Allow types conversion in Native input format
2023-02-27 19:28:19 +00:00
Kruglov Pavel
443dedddca
Merge branch 'master' into use-parquet-2
2023-02-27 14:31:43 +01:00
avogar
54622566df
Add setting to change parquet version
2023-02-23 16:14:10 +00:00
Kruglov Pavel
9866ecfe8b
Merge branch 'master' into null-as-default-all-formats
2023-02-20 20:49:30 +01:00
Geoff Genz
be8bf3a6a3
Merge branch 'master' into http_client_version
2023-02-13 08:43:59 -07:00
avogar
d1efd02480
Extend setting input_format_null_as_default for more formats
2023-02-10 16:41:09 +00:00
Geoff Genz
99c3ff53c5
Merge remote-tracking branch 'origin/master' into http_client_version
...
# Conflicts:
# src/Interpreters/Context.cpp
# src/Interpreters/Context.h
2023-02-10 04:35:53 -07:00
Geoff Genz
7ed8ed0284
Add support for client_protocol_version sent with HTTP
2023-02-10 03:47:06 -07:00
Kruglov Pavel
4e2918cee3
Merge branch 'master' into parquet-fixed-binary
2023-02-08 12:31:13 +01:00
liuneng
17fc22a21e
add parquet max_block_size setting
2023-02-01 18:29:20 +08:00
Alexey Milovidov
04078dbed3
Remove trash
2023-01-29 22:43:36 +01:00
Azat Khuzhin
1a8437f2c9
Add ability to ignore unknown keys in JSON object for named tuples
...
This can be useful in case your input JSON is complex, while you need
only few fields in it.
This behaviour is controlled by the
input_format_json_ignore_unknown_keys_in_named_tuple setting name, that
is turned OFF by default.
This will, almost, allow to parse gharchive dataset without jq. "almost"
because of two things:
- Tuple cannot be Nullable, so such keys with Tuple type in ClickHouse
cannot be `null` in JSON
- You cannot use dot.dot notation to extract columns for file() engine,
only tupleElement()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-27 10:01:08 +01:00
Kruglov Pavel
23c12ac8ee
Merge branch 'master' into parquet-fixed-binary
2023-01-24 16:51:05 +01:00
Kruglov Pavel
cd1cd904a7
Merge branch 'master' into tsv-csv-detect-header
2023-01-23 23:49:56 +01:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages ( #45449 )
...
* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix
2023-01-24 00:13:58 +03:00
avogar
5bf4704e7a
Support FixedSizeBinary type in Parquet/Arrow
2023-01-16 21:01:31 +00:00
Kruglov Pavel
e9d6590926
Merge branch 'master' into tsv-csv-detect-header
2023-01-16 17:50:24 +01:00
avogar
87b934c472
Insert default values in case of missing tuple elements in JSONEachRow
2023-01-12 16:36:44 +00:00
Kruglov Pavel
05a11ff4a4
Merge branch 'master' into tsv-csv-detect-header
2023-01-12 12:35:18 +01:00
Alexey Milovidov
0d39d26a34
Don't fix parallel formatting
2023-01-09 06:15:20 +01:00
avogar
7fcdb08ec6
Detect header in CSV/TSV/CustomSeparated files automatically
2023-01-05 22:57:25 +00:00
Kruglov Pavel
0a43976977
Merge branch 'master' into validate-types
2023-01-02 16:10:14 +01:00
Kruglov Pavel
4982d132fb
Merge branch 'master' into validate-types
2022-12-30 17:52:13 +01:00
Kruglov Pavel
894726bd8f
Merge branch 'master' into improve-streaming-engines
2022-12-29 22:59:45 +01:00
Raúl Marín
5de11979ce
Unify query elapsed time measurements ( #43455 )
...
* Unify query elapsed time reporting
* add-test: Make shell tests executable
* Add some tests around query elapsed time
* Style and ubsan
2022-12-28 21:01:41 +01:00
avogar
4ab3e90382
Validate types in table function arguments/CAST function arguments/JSONAsObject schema inference
2022-12-21 21:21:30 +00:00
Kruglov Pavel
5e01a3d74e
Merge branch 'master' into improve-streaming-engines
2022-12-21 10:51:50 +01:00
Kruglov Pavel
c5b2e4cc23
Merge branch 'master' into improve-streaming-engines
2022-12-15 18:44:35 +01:00
avogar
739ad23b1f
Make better, fix bugs, improve error messages
2022-12-12 22:00:45 +00:00
avogar
cd4fa00d2c
Merge branch 'master' of github.com:ClickHouse/ClickHouse into refactor-schema-inference
2022-12-09 14:45:10 +00:00
avogar
d0f9bb2ec2
Allow to parse JSON objects into Strings
2022-12-08 18:58:18 +00:00
avogar
7375a7d429
Refactor and improve schema inference for text formats
2022-12-07 21:19:27 +00:00
Kruglov Pavel
c35b2a6495
Add a limit for string size in RowBinary format ( #43842 )
2022-12-02 13:57:11 +01:00
Anton Popov
fe5fff0347
Merge pull request #43329 from xiedeyantu/support_nested_column
...
s3 table function can support select nested column using {column_name}.{subcolumn_name}
2022-11-29 22:27:19 +01:00
xiedeyantu
304b6ebf3a
s3 table function can support select nested column using {column_name}.{subcolumn_name}
2022-11-23 23:36:12 +08:00
Kruglov Pavel
98d6b96c82
Merge pull request #42033 from mark-polokhov/BSONEachRow
...
Add BSONEachRow input/output format
2022-11-22 14:45:21 +01:00
Vitaly Baranov
ce81166c7e
Fix style.
2022-11-16 01:35:11 +01:00
Vitaly Baranov
8e99f5fea3
Move maskSensitiveInfoInQueryForLogging() to src/Parsers/
2022-11-14 18:55:19 +01:00
avogar
9e89af28c6
Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference
2022-11-10 20:15:14 +00:00
Kruglov Pavel
b124875257
Merge branch 'master' into improve-streaming-engines
2022-11-03 13:22:06 +01:00
avogar
8e13d1f1ec
Improve and refactor Kafka/StorageMQ/NATS and data formats
2022-10-28 16:41:10 +00:00
Alexey Milovidov
f88ed8195b
Fix trash
2022-10-17 04:21:08 +02:00
Kruglov Pavel
6fc12dd922
Merge pull request #41703 from Avogar/json-object-each-row
...
Add setting to obtain object name as column value in JSONObjectEachRow format
2022-10-14 20:11:04 +02:00
Vitaly Baranov
f65d3ff95a
Fix parallel parsing: segmentator now checks max_block_size.
2022-09-30 22:34:03 +02:00
Kruglov Pavel
f1ac2d66be
Merge branch 'master' into json-object-each-row
2022-09-28 14:15:02 +02:00
avogar
76be0d2ee1
Infer Object type only when allow_experimental_object_type is enabled
2022-09-27 23:07:36 +00:00
avogar
d3d06251a3
Add setting to obtain object name as column value in JSONObjectEachRow format
2022-09-22 16:48:54 +00:00
Kruglov Pavel
22e11aef2d
Merge pull request #40910 from Avogar/new-json-formats
...
Add new JSON formats, add improvements and refactoring
2022-09-21 14:19:08 +02:00
avogar
868ce8bc16
Fix comments, make better naming, add docs, add setting output_format_json_quote_64bit_floats
2022-09-20 13:49:17 +00:00
Alexey Milovidov
da01982652
Merge pull request #41046 from azat/build/llvm-15
...
Switch to llvm/clang 15
2022-09-16 07:31:06 +03:00
Azat Khuzhin
e8d7403a38
Suppress warning in FormatFactory::getFormatFromFileDescriptor() for FreeBSD
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-09-10 21:38:35 +02:00
zhenjial
bd9fabc3f7
code optimization, add test
2022-09-09 23:27:42 +08:00
zhenjial
469ceaa156
code optimization
2022-09-09 00:47:43 +08:00
avogar
c380decbbb
Make better, add new settings
2022-09-08 16:07:20 +00:00
zhenjial
0f788d98f5
new implementation
2022-09-06 20:39:54 +08:00
avogar
b94e896c1c
Remove logs
2022-09-01 19:01:27 +00:00
avogar
afc34dca41
Add new JSON formats, add improvements and refactoring
2022-09-01 19:00:24 +00:00
avogar
612ffaffde
Make schema inference cache better, respect format settings that can change the schema
2022-08-19 16:39:13 +00:00
Nikolai Kochetov
5a85531ef7
Merge pull request #38286 from Avogar/schema-inference-cache
...
Add schema inference cache for s3/hdfs/file/url
2022-08-18 13:07:50 +02:00
avogar
8dd54c043d
Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-cache
2022-08-17 11:47:40 +00:00
avogar
e1ff996ec3
Allow to specify structure hints in schema inference
2022-08-16 09:46:57 +00:00
Kruglov Pavel
6b2186bfeb
Merge branch 'master' into numbers-schema-inference
2022-08-02 19:34:53 +02:00
Kruglov Pavel
381ea139c2
Merge branch 'master' into schema-inference-cache
2022-07-27 11:35:36 +02:00
avogar
784ee11594
Add settings to skip fields with unsupported types in Protobuf/CapnProto schema inference
2022-07-20 11:16:25 +00:00
Kruglov Pavel
b38241b08a
Merge branch 'master' into schema-inference-cache
2022-07-14 12:29:54 +02:00
avogar
7cde9d3b40
Add new features in schema inference
2022-07-13 15:57:55 +00:00
avogar
5b0fd31c64
Put column names in quotes
2022-06-30 16:14:30 +00:00
avogar
9bb68bc6de
Add SQLInsert output format
2022-06-27 18:31:57 +00:00
avogar
5155262a16
Add some additional information to cache keys
2022-06-27 12:43:24 +00:00
Alexey Milovidov
5e9e5a4eaf
Merge pull request #37525 from Avogar/avro-structs
...
Support Maps and Records, allow to insert null as default in Avro format
2022-06-15 00:04:29 +03:00
Robert Schulze
1a0b5f33b3
More consistent use of platform macros
...
cmake/target.cmake defines macros for the supported platforms, this
commit changes predefined system macros to our own macros.
__linux__ --> OS_LINUX
__APPLE__ --> OS_DARWIN
__FreeBSD__ --> OS_FREEBSD
2022-06-10 10:22:31 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
...
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
Kruglov Pavel
0615866aea
Merge pull request #37450 from Avogar/check-format-on-storage-creation
...
Check format name on storage creation
2022-05-30 14:23:20 +02:00
Alexey Milovidov
c50791dd3b
Fix clang-tidy-14, part 1
2022-05-27 22:52:14 +02:00
avogar
4c9812d4c1
Allow to skip some of the first rows in CSV/TSV formats
2022-05-25 15:00:11 +00:00