avogar
d37ad2e6de
Implement cache for schema inference for file/s3/hdfs/url
2022-06-21 13:02:48 +00:00
Alexey Milovidov
5e9e5a4eaf
Merge pull request #37525 from Avogar/avro-structs
...
Support Maps and Records, allow to insert null as default in Avro format
2022-06-15 00:04:29 +03:00
Robert Schulze
1a0b5f33b3
More consistent use of platform macros
...
cmake/target.cmake defines macros for the supported platforms, this
commit changes predefined system macros to our own macros.
__linux__ --> OS_LINUX
__APPLE__ --> OS_DARWIN
__FreeBSD__ --> OS_FREEBSD
2022-06-10 10:22:31 +02:00
Kruglov Pavel
6f17ba17ba
Revert "Revert "Fix possible segfault in schema inference""
2022-06-02 13:28:27 +02:00
Alexander Tokmakov
4baae59252
Revert "Fix possible segfault in schema inference"
2022-06-02 14:04:28 +03:00
avogar
4abfd54dd6
Fix possible segfault in schema inference
2022-06-01 16:53:37 +00:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
...
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
Kruglov Pavel
0615866aea
Merge pull request #37450 from Avogar/check-format-on-storage-creation
...
Check format name on storage creation
2022-05-30 14:23:20 +02:00
Alexey Milovidov
c50791dd3b
Fix clang-tidy-14, part 1
2022-05-27 22:52:14 +02:00
avogar
4c9812d4c1
Allow to skip some of the first rows in CSV/TSV formats
2022-05-25 15:00:11 +00:00
avogar
038a422aeb
Add setting to insert null as default
2022-05-25 12:56:59 +00:00
avogar
f782fa31c6
Merge branch 'master' of github.com:ClickHouse/ClickHouse into check-format-on-storage-creation
2022-05-25 08:42:54 +00:00
avogar
37b66c8a9e
Check format name on storage creation
2022-05-23 12:48:48 +00:00
Kruglov Pavel
f539fb835d
Merge branch 'master' into formats-with-names
2022-05-23 12:14:20 +02:00
Kruglov Pavel
ce48e8e102
Merge pull request #36975 from Avogar/json-columns-formats
...
Add columnar JSON formats
2022-05-23 12:11:28 +02:00
avogar
a4cf07708c
Fix comments
2022-05-20 14:57:27 +00:00
avogar
566d1b15fd
Merge branch 'master' of github.com:ClickHouse/ClickHouse into formats-with-names
2022-05-20 13:54:52 +00:00
avogar
44726122bb
Join JSON registration
2022-05-20 12:09:51 +00:00
avogar
a6a430c5ee
Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-columns-formats
2022-05-20 11:08:30 +00:00
mergify[bot]
1ac4199e78
Merge branch 'master' into arrow-strings
2022-05-20 10:43:33 +00:00
avogar
cd6a29897e
Apply input_format_max_rows_to_read_for_schema_inference for all files in globs in total
2022-05-18 17:56:36 +00:00
avogar
a0369fb9a6
Allow to use String type instead of Binary in Arrow/Parquet/ORC formats
2022-05-18 14:51:21 +00:00
Kruglov Pavel
134821eff8
Fix build
2022-05-18 12:44:20 +02:00
avogar
12010a81b7
Make better
2022-05-18 09:25:26 +00:00
Robert Schulze
e3cfec5b09
Merge remote-tracking branch 'origin/master' into clangtidies
2022-05-16 10:12:50 +02:00
avogar
68bb07d166
Better naming
2022-05-13 18:39:19 +00:00
avogar
cef13c2c02
Allow to skip unknown columns in Native format
2022-05-13 14:27:15 +00:00
avogar
b17fec659a
Improve performance and memory usage for select of subset of columns for some formats
2022-05-13 13:51:28 +00:00
avogar
f6b16880bd
Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-columns-formats
2022-05-10 12:57:18 +00:00
Anton Popov
e911900054
remove last mentions of data streams
2022-05-09 19:15:24 +00:00
avogar
04fdd75c56
Make JSONColumns frormats mono block by default
2022-05-09 11:13:44 +00:00
Robert Schulze
f2b1748c48
Enable clang-tidy bugprone-suspicious-semicolon
...
Official docs:
Finds most instances of stray semicolons that unexpectedly alter the
meaning of the code.
2022-05-08 19:13:37 +02:00
avogar
62a7ba3f26
Add columnar JSON formats
2022-05-06 16:48:48 +00:00
Kruglov Pavel
77e55c344c
Merge pull request #36667 from Avogar/mysqldump-format
...
Add MySQLDump input format
2022-05-04 19:49:48 +02:00
Dmitry Novik
5ba7a55c18
Merge pull request #36650 from bigo-sg/hive_text_parallel_parsing
...
Parallel parsing of hive text format
2022-05-03 15:56:28 +02:00
Kruglov Pavel
d613f7eab0
Merge branch 'master' into mysqldump-format
2022-05-02 13:31:57 +02:00
Antonio Andelic
a1a22b0007
Merge pull request #35149 from ContentSquare/nullables_with_proto3
...
Nullables with proto3 using Google wrappers
2022-05-02 09:49:37 +02:00
Robert Schulze
89aa9ae00f
Fixed clang-tidy check "bugprone-branch-clone"
...
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"
About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.
About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.
Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
2022-04-30 19:40:28 +02:00
Jakub Kuklis
a1f2dd6d34
Adding two settings in place of one, improvements to the test clarity
2022-04-29 10:01:51 +02:00
Jakub Kuklis
e73fa271a2
Minor improvements
2022-04-29 10:01:51 +02:00
Jakub Kuklis
5ca095c779
Pass the setting to buildFieldSerializer to fix undeclared
2022-04-29 10:01:51 +02:00
Jakub Kuklis
e705425374
Minor improvements
2022-04-29 10:01:51 +02:00
Jakub Kuklis
5c34585a00
Improve the test clarity
2022-04-29 10:01:51 +02:00
Jakub Kuklis
f19e473482
Remove local change
2022-04-29 10:01:51 +02:00
Jakub Kuklis
507ba1042c
Adding a setting to enable Google wrappers special treatment
2022-04-29 10:01:51 +02:00
Jakub Kuklis
6d5c1e2fc0
Adding a setting to enable special treatment of google wrappers
2022-04-29 10:01:50 +02:00
Jakub Kuklis
b7a8acc302
Alternative design for output, mory messy, but the default value inside Google wrapper is not serialized
2022-04-29 10:01:50 +02:00
Jakub Kuklis
53e2454800
Corrected the behaviour for Proto Nullable output
2022-04-29 10:01:50 +02:00
Jakub Kuklis
10425c17b2
Write empty values for Google wrappers
2022-04-29 10:01:50 +02:00
Jakub Kuklis
ff49fad1f1
Another const keyword corrections for debug build
2022-04-29 10:01:50 +02:00
Jakub Kuklis
08ee7470f0
const keyword corrections for debug build
2022-04-29 10:01:50 +02:00
Jakub Kuklis
c0acc4dfa0
Fixing assert
2022-04-29 10:01:50 +02:00
Jakub Kuklis
7a78197746
Style corrections
2022-04-29 10:01:50 +02:00
Jakub Kuklis
ae1194bf9c
Nullables detection in protobuf using Google wrappers
2022-04-29 10:01:50 +02:00
Amos Bird
4a5e4274f0
base should not depend on Common
2022-04-29 10:26:35 +08:00
avogar
d295de1689
Fix comments and test
2022-04-28 14:59:35 +00:00
Kruglov Pavel
4d08587559
Merge branch 'master' into mysqldump-format
2022-04-28 15:58:18 +02:00
Vladimir C
1cbdc1ef3a
Merge pull request #36206 from vdimir/output-format-prometheus
2022-04-28 12:09:53 +02:00
vdimir
be0aa06958
Add output format Prometheus
2022-04-26 14:57:35 +00:00
avogar
b666b4e1c9
Fix possible heap-use-after-free in schema inference
2022-04-26 14:36:16 +00:00
avogar
33d845dade
Add MySQLDump input format
2022-04-26 10:42:56 +00:00
taiyang-li
b7cc344d62
remove useless codes
2022-04-26 14:42:43 +08:00
taiyang-li
99dee35b6e
parallel parsing of hive text format
2022-04-26 14:33:10 +08:00
Kruglov Pavel
34c342fdd3
Merge pull request #36205 from Avogar/improve-globs
...
Some refactoring around schema inference with globs
2022-04-25 13:14:46 +02:00
avogar
80eacc8533
Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-json-schema-inference
2022-04-22 17:18:44 +00:00
avogar
a093181b4f
Fix comments
2022-04-21 11:48:17 +00:00
Kruglov Pavel
813e228fcc
Merge branch 'master' into improve-globs
2022-04-20 16:31:47 +02:00
avogar
f31f019252
Fix
2022-04-19 19:25:41 +00:00
avogar
1f252cedfe
Make better
2022-04-19 19:16:47 +00:00
Kruglov Pavel
ec4e1cb6d8
Merge pull request #36211 from Avogar/insert-select-all-formats
...
Allow insert select for files with formats without schema inference
2022-04-19 14:25:59 +02:00
avogar
ae88549c4f
Allow insert select for files with formats without schema inference
2022-04-13 20:02:52 +00:00
avogar
8b60aeb7bc
Improve schema inference for json objects
2022-04-13 19:13:40 +00:00
avogar
1c065f8c7a
Some refactoring around schema inference with globs
2022-04-13 17:02:48 +00:00
avogar
348cae0d16
Fix possible segfault in schema inference for JSON formats
2022-04-13 12:34:40 +00:00
avogar
d2017a63b1
Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference
2022-04-07 11:36:40 +00:00
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
...
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
taiyang-li
2ef316801c
Merge branch 'master' into use_minmax_index
2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers
2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference
2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e
suppoort skip splits in orc and parquet
2022-04-06 16:40:22 +08:00
Maksim Kita
e6c9a36ac7
Merge pull request #35733 from kitaisreal/ipv6-invalid-insert-test
...
Added test for insert of invalid IPv6 value
2022-04-04 12:28:16 +02:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order
2022-04-02 12:00:29 +00:00
avogar
ab2a963287
Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers
2022-03-31 14:09:43 +00:00
mergify[bot]
24ade25d61
Merge branch 'master' into improve-schema-inference
2022-03-31 13:42:47 +00:00
Kruglov Pavel
564a77c462
Fix build
2022-03-31 12:49:23 +02:00
Maksim Kita
371cdc956a
Added input format settings for parsing invalid IPv4, IPv6 addresses as default values
2022-03-30 12:54:19 +02:00
avogar
3fc36627b3
Allow to infer and parse bools as numbers in JSON input formats
2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9
Improve schema inference for JSONEachRow and TSKV formats
2022-03-29 14:47:51 +00:00
Kruglov Pavel
a2fd09e031
Fix style
2022-03-29 16:34:07 +02:00
Antonio Andelic
9990abb76a
Use compile-time check for Exception messages, fix wrong messages
2022-03-29 13:16:11 +00:00
mergify[bot]
343588de2c
Merge branch 'master' into improve-schema-inference
2022-03-29 13:06:00 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
...
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
...
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
Anton Popov
a6450be8b6
fix schema inference
2022-03-26 01:33:10 +00:00
Anton Popov
67195bfdd5
support schema inference for type Object in format JSONEachRow
2022-03-25 21:51:53 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference
2022-03-25 12:05:40 +01:00
Kruglov Pavel
1823cac89d
Update src/Formats/EscapingRuleUtils.h
...
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:19:32 +01:00
Anton Popov
78100abc5f
add parallel parsing and schema inference for type Object
2022-03-24 17:51:35 +00:00
avogar
abc020a502
Clean up
2022-03-24 13:08:58 +00:00
avogar
557edbd172
Add some improvements and fixes in schema inference
2022-03-24 12:54:12 +00:00
Antonio Andelic
0c23cd7b94
Add support for case insensitive column matching in arrow
2022-03-22 10:55:10 +00:00
Antonio Andelic
29d2bf7d1a
Merge branch 'master' into case-insensitive-column-matching
2022-03-21 08:17:27 +00:00
Antonio Andelic
f75b054255
Allow case insensitive column matching
2022-03-21 07:47:37 +00:00
Kruglov Pavel
aa3c05e9d4
Merge pull request #35152 from rschu1ze/protobuf-batch-write
...
ProtobufList
2022-03-18 13:24:34 +01:00
Antonio Andelic
607f785e48
Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
...
This reverts commit ebf72bf61d
, reversing
changes made to f1b812bdc1
.
2022-03-17 12:31:43 +00:00
Robert Schulze
6e1d7a31bc
Fix build + typo
2022-03-17 11:41:20 +01:00
Anton Popov
2ced42ed41
add experimental settings for Object type
2022-03-16 16:51:23 +00:00
Anton Popov
0ba78c3c3a
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-16 15:28:09 +00:00
Robert Schulze
0d2ece6d91
Merge branch 'ClickHouse:master' into protobuf-batch-write
2022-03-16 09:43:33 +01:00
avogar
e2d1e643f2
Fix possible segfault in JSONEachRow schema inference
2022-03-15 11:44:15 +00:00
Robert Schulze
23122cb327
Fix review comments
...
ParquetBlockOutputFormat.cpp:
- undo unrelated formatting
ProtobufSerializer.cpp:
- undef debug tracing
- simplify logic in writeRow()
ProtobufSchemas.cpp:
- restore original search in cache by message type
2022-03-15 11:27:17 +01:00
Maksim Kita
538f8cbaad
Fix clang-tidy warnings in Disks, Formats, Functions folders
2022-03-14 18:17:35 +00:00
Anton Popov
36ec379aeb
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-14 16:28:35 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
...
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
Robert Schulze
514d4d2187
Implement ProtobufList - fixes ClickHouse#16436
...
Introduce IO format "ProtobufList" with protobuf schema
// schemafile.proto
message Envelope {
message MessageType {
uint32 colA = 1;
string colB = 2;
}
repeated MessageType mt = 1;
}
where "Envelope" is a hard-coded/expected top-level message and
"MessageType" is a message with user-provided name containing the table
fields to export/import, e.g.
SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema =
'schemafile:MessageType'
As a result, the new format wraps a list of messages (one per row) into
a single, containing message. Compare that to the schema of the existing
IO formats "Protobuf" and "ProtobufSingle":
message MessageType {
uint32 colA = 1;
string colB = 2;
}
The new format does not save space compared to the existing formats, but
it is conceptually a bit more beautiful and also more convenenient.
Implementation details:
- Created new files ProtobufList(Input|Output)Format which use the
existing ProtobufSerializer mechanism. The goal was to reuse as much
code as possible and avoid copypasta.
- I was torn between inheriting from I(Input|Output)Format vs.
IRow(Input|Output)Format for ProtobufList(Input|Output)Format. The
former is chunk-based which can be better for performance. Since the
ProtobufSerializer mechanism is row-based but data is generally passed
around in chunks, I decided for the latter to leverage the existing
chunk <--> row mapping code in IRow(InputOutput)Format.
- A new ProtobufSerializer called ProtobufSerializerEnvelope was
introduced (--> ProtobufSerializer.cpp). It represents the top-level
message which encloses the list of inner nested messages, i.e. the
rows.
- With the new format, parsing the schema file and matching the fields in
the schema file to table column works like for the old formats. The only
difference is that parsing starts one level below the "Envelope" (-->
ProtobufSchema.cpp). This is more natural than forcing customers to
have table columns start with "Envelope".
- Creation of the ProtobufSerializer tree also works like before. What
is different is that we finally add a ProtobufSerializerEnvelope as
new root of the tree. It's only purpose is to write/read the top-level
message for the first/last row to write/read.
Caveats:
- The low-level serialization code in ProtobufWriter uses an internal
buffer which is flushed to the output file only in endMessage().
In the existing "Protobuf" format, this happens once per row, in the
new format this happens only at the end of the serialization
since row-level messages now call start/endNestedMessage(). As a
future TODO to, the buffer should be flushed also in
start/endNestedMessage() to reduce memory consumption.
2022-03-14 08:04:58 +01:00
zhanghuajie
53a8987b3b
fix build fail with gcc --fix warnings without disabling some parameters
2022-03-11 21:59:19 +08:00
shuchaome
46cb4483a6
Optimise by lowering schema on the beginning. Add a functional test.
2022-03-11 14:34:46 +08:00
shuchaome
56795b831d
add setting to lower column case when reading parquet/orc file
2022-03-09 16:07:02 +08:00
Anton Popov
0bc57da238
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-07 14:46:08 +00:00
Azat Khuzhin
c426eef07d
Fix generating USE_* for system.build_options
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-04 15:31:32 +03:00
Anton Popov
df3b07fe7c
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-03 22:25:28 +00:00
Maksim Kita
f1b1baf56e
Merge pull request #34982 from Cai-Yao/master
...
date_time_input_format = 'best_effort_us'
2022-03-03 09:22:57 +01:00
Maksim Kita
b1a956c5f1
clang-tidy check performance-move-const-arg fix
2022-03-02 18:15:27 +00:00
cwkyaoyao
72194bbaf3
Add date_time_input_format = best_effort_us
2022-03-02 16:00:06 +08:00
avogar
a7c6d11532
Fix schema inference for unquoted dates in CSV
2022-03-01 11:03:26 +00:00
Anton Popov
18940b8637
Merge remote-tracking branch 'upstream/master' into HEAD
2022-02-09 23:38:38 +03:00
Kruglov Pavel
15d85682e8
Fix style
2022-02-07 18:29:22 +03:00
avogar
a4c7ecde87
Make better
2022-02-07 17:51:26 +03:00
avogar
77b42bb9ff
Support UUID in MsgPack format
2022-02-07 17:11:44 +03:00
Anton Popov
836a348a9c
Merge remote-tracking branch 'upstream/master' into HEAD
2022-02-01 15:23:07 +03:00
Maksim Kita
5ef83deaa6
Update sort to pdqsort
2022-01-30 19:49:48 +00:00
Anton Popov
78b9f15abb
Merge remote-tracking branch 'upstream/master' into HEAD
2022-01-30 03:24:37 +03:00
Kruglov Pavel
7873b4475f
Merge branch 'master' into autodetect-format
2022-01-25 10:56:52 +03:00
avogar
a6740d2f9a
Detect format and schema for stdin in clickhouse-local
2022-01-25 10:25:37 +03:00
avogar
1f49acc164
Better naming
2022-01-24 16:28:36 +03:00
Anton Popov
e8ce091e68
Merge remote-tracking branch 'upstream/master' into HEAD
2022-01-21 20:11:18 +03:00
Kruglov Pavel
7bfb1231b9
Merge branch 'master' into formats-with-suffixes
2022-01-20 14:47:17 +03:00
Azat Khuzhin
91e3ceeea9
Remove unbundled capnp support
2022-01-20 10:01:58 +03:00
Azat Khuzhin
a30ef87d65
Remove unbundled msgpack support
2022-01-20 10:01:58 +03:00
Azat Khuzhin
788cb6b2b0
Remove unbundled protobuf support
2022-01-20 08:47:16 +03:00
Azat Khuzhin
1145e32af6
Remove unbundled snappy support
2022-01-20 08:47:16 +03:00
Azat Khuzhin
ab8cdb198f
Remove unbundled orc support
2022-01-20 08:47:16 +03:00
Azat Khuzhin
d1b2bd5fbe
Remove unbundled avro support
2022-01-20 08:47:16 +03:00
Azat Khuzhin
b4ad324a88
Remove unbundled parquet/arrow support
2022-01-20 08:47:16 +03:00
Kruglov Pavel
a7df9cd53a
Merge branch 'master' into formats-with-suffixes
2022-01-14 21:03:49 +03:00
avogar
253035a5df
Fix
2022-01-14 19:17:06 +03:00
Kruglov Pavel
d2e9f37bee
Merge branch 'master' into format-by-extention
2022-01-14 18:36:23 +03:00
avogar
89a181bd19
Make better
2022-01-14 18:16:18 +03:00
avogar
817a314263
Fix tests and style
2022-01-14 17:46:24 +03:00
Kruglov Pavel
5a908e8edd
Merge branch 'master' into formats-with-suffixes
2022-01-14 16:45:20 +03:00
Kseniia Sumarokova
5da673c3a5
Merge pull request #31104 from bigo-sg/hive_table
...
Implement hive table engine
2022-01-14 09:39:17 +03:00
avogar
2d7b1bfa5e
Detect format in S3/HDFS/URL table engines
2022-01-13 16:14:18 +03:00
Kruglov Pavel
305d58a762
Merge pull request #33524 from Avogar/stacktrace-in-client
...
Don't print exception twice in client in case of exception in parallel parsing
2022-01-13 15:50:42 +03:00
avogar
8390e9ad60
Detect format by file name in file/hdfs/s3/url table functions
2022-01-12 18:29:31 +03:00
taiyang-li
66813a3aa9
merge master
2022-01-12 16:56:29 +08:00
avogar
0ae0aa712b
Don't print exception twice in client in case of exception in parallel parsing
2022-01-11 18:37:07 +03:00
zhongyuankai
99279c1443
INTO OUTFILE / FROM INFILE: autodetect FORMAT by file extension
2022-01-11 21:26:14 +08:00
zhongyuankai
878e44eb97
auto format by file extension
2022-01-08 21:47:14 +08:00
taiyang-li
1e102bc1b2
merge master
2022-01-01 09:01:06 +08:00
alesapin
16c36d72b1
Merge pull request #33296 from ClickHouse/fix_clang_tidy_3
...
Fix clang tidy 3
2021-12-29 22:43:42 +03:00
avogar
97788b9c21
Allow to create new files on insert for File/S3/HDFS engines
2021-12-29 21:19:13 +03:00
Kruglov Pavel
489a30859f
Merge pull request #32455 from Avogar/schema-inference
...
Automatic schema inference for input formats
2021-12-29 21:03:48 +03:00
alesapin
34145c47da
Fix clang tidy
2021-12-29 18:36:42 +03:00
avogar
78b522fd51
Fix fasttest build
2021-12-29 12:21:01 +03:00
avogar
d718a2e220
Clean up
2021-12-29 12:21:01 +03:00
avogar
26abf7aa62
Remove code duplication, use simdjson and rapidjson instead of Poco
2021-12-29 12:21:01 +03:00
avogar
aaf9f85c67
Add more tests and fixes
2021-12-29 12:18:56 +03:00
avogar
dd994aa761
Add some tests and some fixes
2021-12-29 12:18:56 +03:00
avogar
8112a71233
Implement schema inference for most input formats
2021-12-29 12:18:56 +03:00
taiyang-li
9036b18c2f
merge master
2021-12-27 15:12:48 +08:00
Raúl Marín
cb22091b33
Merge remote-tracking branch 'blessed/master' into kill_scalar_github
2021-12-23 13:59:33 +01:00
Kruglov Pavel
a1455c0f2a
Merge pull request #32981 from Avogar/fix-csv-tuples
...
Fix tuple output in CSV format
2021-12-23 13:27:34 +03:00
Alexey Milovidov
df0e7d9ed3
Merge branch 'Issue81' of github.com:DevTeamBK/ClickHouse into merge-33050
2021-12-23 02:03:03 +03:00
Alexey Milovidov
b2d9e33882
Whitespaces
2021-12-23 02:02:36 +03:00
Raúl Marín
9bb88c26d8
Add existing progress to the record of the output format progress
2021-12-22 23:14:23 +01:00
Boris Kuschel
c62d9e2f2d
Out of Bounds Column Index
...
Signed-off-by: Boris Kuschel <Boris.Kuschel@ibm.com>
2021-12-21 22:43:47 -05:00
taiyang-li
2597925724
merge master
2021-12-21 15:55:39 +08:00
avogar
ba6a513db0
Fix tuple output in CSV format
2021-12-20 19:27:09 +03:00
kreuzerkrieg
f06c37d206
Stop reading incomplete stripes and skip rows.
2021-12-19 18:41:32 +02:00
Anton Popov
99ebabd822
Merge remote-tracking branch 'upstream/master' into HEAD
2021-12-17 19:02:29 +03:00
alesapin
6bd7e425c6
Merge pull request #22535 from CurtizJ/sparse-serialization
...
Sparse serialization and ColumnSparse
2021-12-17 15:26:17 +03:00
taiyang-li
d033fc4c24
merge master and fix conflict
2021-12-17 15:11:21 +08:00
alesapin
884801e1bd
Fixing
2021-12-14 19:08:08 +03:00
Anton Popov
16312e7e4a
Merge remote-tracking branch 'upstream/master' into HEAD
2021-12-14 18:58:17 +03:00
taiyang-li
ca3f7425a4
fix code
2021-12-14 17:37:31 +08:00
taiyang-li
8234d1176f
merge master
2021-12-14 10:39:21 +08:00
Alexey Milovidov
71926a3a97
Fix surprisingly bad code in function "file"
2021-12-13 07:57:54 +03:00
李扬
8675086104
Merge branch 'master' into hive_table
2021-12-12 09:01:46 -06:00
Vitaly Baranov
463ce1fcee
Merge pull request #27822 from filimonov/kafka_protobuf_issue26643
...
Test for issue #26643
2021-12-11 20:31:22 +03:00
Vitaly Baranov
abe9dd3368
Merge pull request #32531 from vitlibar/fix-nested-array-sizes-for-missing-columns
...
Improve handling nested structures with missing columns while reading protobuf
2021-12-11 11:08:34 +03:00
Vitaly Baranov
b5b195f4e2
Merge branch 'master' into kafka_protobuf_issue26643
2021-12-10 23:22:35 +03:00
Vitaly Baranov
82c2d8dd2c
Add synchronization to ProtobufSchemas.
2021-12-10 23:18:47 +03:00
Vitaly Baranov
73092942ea
Take into account nested structures while filling missing columns while reading protobuf.
2021-12-10 21:11:06 +03:00
Anton Popov
d8367334a3
Merge remote-tracking branch 'upstream/master' into HEAD
2021-12-08 18:26:19 +03:00
Kseniia Sumarokova
926fd568c7
Merge pull request #32113 from FrankChen021/url_http_header
...
Set Content-Type in HTTP packets issued from URL engine
2021-12-07 08:52:36 +03:00
Kseniia Sumarokova
eab6f0ba49
Update FormatFactory.cpp
2021-12-06 23:35:29 +03:00
Kruglov Pavel
cc71c537bc
Merge pull request #32204 from Avogar/skip-quoted-values
...
Improve skiping unknown fields with Quoted escaping rule in Template/CustomSeparated formats
2021-12-06 12:28:14 +03:00
Vitaly Baranov
d709782088
Merge pull request #31988 from vitlibar/fix-skipping-columns-while-writing-protobuf
...
Fix skipping columns while writing protobuf
2021-12-05 18:01:11 +03:00
Vitaly Baranov
2e0b480044
Improve error handling while serializing protobufs.
2021-12-04 21:42:45 +03:00
Vitaly Baranov
15e3dbe3f2
Fix skipping columns in Nested while writing protobuf.
2021-12-04 18:00:02 +03:00
frank chen
c319b6fa32
Fix style
...
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 22:09:04 +08:00
avogar
7549619b25
Improve skiping unknown fields with Quoted escaping rule in Template/CustomSeparated formats
2021-12-03 16:25:35 +03:00
frank chen
898db5b468
Resolve review comments
...
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 19:47:05 +08:00
Anton Popov
f6be3d16fd
Merge pull request #24820 from kssenii/versioning
...
Versioning of aggregate function states
2021-12-03 01:41:44 +03:00
taiyang-li
9ec8272186
refactor hive text input format
2021-12-02 16:14:25 +08:00
Anton Popov
6f4d9a53b2
Merge remote-tracking branch 'origin/sparse-serialization' into HEAD
2021-12-01 15:54:33 +03:00
Anton Popov
54f51444c0
Merge remote-tracking branch 'upstream/master' into HEAD
2021-12-01 15:49:02 +03:00
taiyang-li
4aeadf3967
fix build error
2021-12-01 14:13:48 +08:00
kssenii
71bfc72e37
Fix
2021-11-30 14:42:37 +00:00
taiyang-li
d213500a3e
remove blank at end of line
2021-11-30 18:23:24 +08:00
mergify[bot]
8d5460b469
Merge branch 'master' into feature-support-bool-type
2021-11-29 11:50:18 +00:00
kssenii
be3b4ca8fe
Merge branch 'master' of github.com:ClickHouse/ClickHouse into versioning
2021-11-27 09:44:31 +00:00
kssenii
515261f5dd
Better
2021-11-27 09:40:46 +00:00
taiyang-li
72f60cceb9
Merge branch 'master' into hive_table
2021-11-25 17:33:26 +08:00
Kseniia Sumarokova
93cf66df12
Merge pull request #30936 from kssenii/seekable-read-buffers
...
Reduce memory usage for some formats when reading with s3/url/hdfs
2021-11-25 11:19:24 +03:00
lgbo
996d7125c0
Merge branch 'master' into hive_table
2021-11-23 10:19:02 +08:00
Anton Popov
ccd78e3838
Merge remote-tracking branch 'upstream/master' into HEAD
2021-11-22 17:19:35 +03:00
Kruglov Pavel
cded91b013
Update verbosePrintString.h
2021-11-19 16:51:49 +03:00
taiyang-li
e8644807fe
merge master and solve conflict
2021-11-19 15:01:58 +08:00
MaxWk
f17d5b02e4
use bool representation
2021-11-19 14:30:22 +08:00
kssenii
1a9817f872
Correct merge
2021-11-18 07:56:10 +00:00
avogar
1ebcbf4748
Fix style
2021-11-16 17:10:30 +03:00
avogar
8e9783388b
Add formats CustomSeparatedWithNames/WithNamesAndTypes
2021-11-16 17:10:30 +03:00
avogar
73d1918410
tmp
2021-11-16 17:10:30 +03:00
kssenii
37f482d478
Merge branch 'master' of github.com:ClickHouse/ClickHouse into versioning
2021-11-15 07:31:11 +00:00
kssenii
f18dcd2287
Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers
2021-11-13 14:38:57 +03:00
cgp
18504f545a
move InputCreatorFunc to InputCreator
2021-11-12 00:34:59 +08:00
MaxWk
d42a454837
support some bool format
2021-11-11 16:01:32 +08:00
taiyang-li
deef4d4dbe
add options read_bool_as_uint8 when parse csv
2021-11-11 11:49:54 +08:00
Anton Popov
a20922b2d3
Merge remote-tracking branch 'origin/sparse-serialization' into HEAD
2021-11-09 15:36:25 +03:00
Anton Popov
66973a2a28
Merge remote-tracking branch 'upstream/master' into HEAD
2021-11-08 21:27:45 +03:00
taiyang-li
36ca0b296b
implement hive table engine
2021-11-05 19:55:30 +08:00
Anton Popov
84e914e05a
minor fixes near serializations
2021-11-05 01:46:00 +03:00
kssenii
ec11179f91
Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers
2021-11-03 14:33:31 +03:00
kssenii
45ea820297
Reduce memory usage for some formats
2021-11-03 14:30:03 +03:00
Kruglov Pavel
327a34e9da
Merge pull request #30497 from Avogar/null-deserialization
...
Add custom null representation support for TSV/CSV input formats, fix Nullable(String) deserializing in some formats
2021-11-03 11:30:25 +03:00
avogar
42ab57f0e5
Set output_format_avro_rows_in_file default to 1
2021-11-02 14:06:10 +03:00
Kruglov Pavel
901ebcede6
Merge pull request #30351 from arenadata/ADQM-335
...
output_format_avro_rows_in_file
2021-11-02 12:25:27 +03:00
Kruglov Pavel
1f8535c02b
Merge branch 'master' into null-deserialization
2021-11-02 12:15:21 +03:00
Anton Popov
1628f50e51
Merge branch 'master' into sparse-serialization
2021-11-02 06:26:18 +03:00
Kruglov Pavel
9a1275cb10
Merge pull request #30178 from Avogar/tsv-csv
...
Refactor and improve TSV, CSV, JSONCompactEachRow, RowBinary formats. Fix bugs in formats
2021-11-02 00:38:30 +03:00
Anton Popov
d50137013c
Merge remote-tracking branch 'upstream/master' into HEAD
2021-11-01 16:55:53 +03:00
Vitaly Baranov
d29b73e301
Merge pull request #30689 from vitlibar/refactor-log-family
...
Refactoring of Log family
2021-10-31 18:50:08 +03:00
Vitaly Baranov
0e8c9b089f
Keep indices for StorageStripeLog in memory.
2021-10-31 03:52:41 +03:00
Anton Popov
0099dfd523
refactoring of SerializationInfo
2021-10-29 20:21:02 +03:00
Kruglov Pavel
7d4f211d5b
Merge branch 'master' into tsv-csv
2021-10-29 16:38:06 +03:00
Alexey Milovidov
8b4a6a2416
Remove cruft
2021-10-28 02:10:39 +03:00
avogar
d1ef96a5ef
Add test, avoid unnecessary allocations, use PeekableReadBuffer only in corner case
2021-10-27 17:29:15 +03:00
avogar
d5c5a3213b
Add custom null representation support for TSV/CSV input formats, fix bugs in deserializing NULLs in some cases
2021-10-21 16:52:27 +03:00
Ilya Golshtein
551a1065c1
output_format_avro_rows_in_file default is 1000000
2021-10-21 14:19:25 +03:00