Commit Graph

888 Commits

Author SHA1 Message Date
Anton Popov
86b29b7f1a fix serilization of Object inside other types 2022-09-08 15:16:39 +00:00
zhenjial
0f788d98f5 new implementation 2022-09-06 20:39:54 +08:00
avogar
b94e896c1c Remove logs 2022-09-01 19:01:27 +00:00
avogar
afc34dca41 Add new JSON formats, add improvements and refactoring 2022-09-01 19:00:24 +00:00
avogar
acf87c1d10 Fix nested JSON Objects schema inference 2022-08-31 14:10:29 +00:00
vdimir
0349c85017
Use getCompressedBytes in BufferingToFileTransform and TemporaryFileStream 2022-08-24 16:14:10 +00:00
vdimir
51c44424cc
More metrics for temp files 2022-08-24 16:14:09 +00:00
avogar
29a887578b Fix 2022-08-23 11:42:57 +00:00
avogar
5ab87f1da4 Small refactoring 2022-08-19 16:42:23 +00:00
avogar
612ffaffde Make schema inference cache better, respect format settings that can change the schema 2022-08-19 16:39:13 +00:00
Nikolai Kochetov
5a85531ef7
Merge pull request #38286 from Avogar/schema-inference-cache
Add schema inference cache for s3/hdfs/file/url
2022-08-18 13:07:50 +02:00
avogar
8dd54c043d Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-cache 2022-08-17 11:47:40 +00:00
avogar
e1ff996ec3 Allow to specify structure hints in schema inference 2022-08-16 09:46:57 +00:00
Kruglov Pavel
088e8cf9bd
Merge branch 'master' into numbers-schema-inference 2022-08-09 14:00:36 +02:00
avogar
1304e3487c Add comments, remove unneded stuff 2022-08-08 13:43:14 +00:00
avogar
9b1a267203 Refactor, remove TTL, add size limit, add system table and system query 2022-08-05 16:20:15 +00:00
Kruglov Pavel
6b2186bfeb
Merge branch 'master' into numbers-schema-inference 2022-08-02 19:34:53 +02:00
Anton Popov
a333cc4146 Merge remote-tracking branch 'upstream/master' into HEAD 2022-08-02 12:57:43 +00:00
Kruglov Pavel
a0d51601bf
Update EscapingRuleUtils.cpp 2022-08-01 13:07:48 +02:00
Alexey Milovidov
4828be7fc4 Fix double escaping in the metadata of FORMAT JSON 2022-07-30 23:56:41 +02:00
Anton Popov
57e4fb2e30 Merge remote-tracking branch 'upstream/master' into HEAD 2022-07-29 11:42:11 +00:00
Kruglov Pavel
381ea139c2
Merge branch 'master' into schema-inference-cache 2022-07-27 11:35:36 +02:00
Kruglov Pavel
5aae0a2e04
Fix style 2022-07-25 17:20:01 +02:00
Anton Popov
49627aa554 Merge remote-tracking branch 'upstream/master' into HEAD 2022-07-22 17:16:06 +00:00
avogar
794aa691bc Merge branch 'master' of github.com:ClickHouse/ClickHouse into fix-protobuf-capnp-empty-message 2022-07-21 17:04:37 +00:00
Anton Popov
e0d2c8fb37 fix json type with sparse columns 2022-07-21 14:47:19 +00:00
avogar
17a271ec30 Fix error codes 2022-07-20 14:33:46 +00:00
Kruglov Pavel
46da17ca8c
Merge branch 'master' into numbers-schema-inference 2022-07-20 13:32:39 +02:00
Kruglov Pavel
3046cd6d29
Merge branch 'master' into schema-inference-cache 2022-07-20 13:30:42 +02:00
Kruglov Pavel
3fb3015375
Merge pull request #39340 from Avogar/better-exception-messages
Better exception messages in schema inference
2022-07-20 13:29:15 +02:00
avogar
784ee11594 Add settings to skip fields with unsupported types in Protobuf/CapnProto schema inference 2022-07-20 11:16:25 +00:00
Kruglov Pavel
88d59520a2
Fix 2022-07-19 15:20:56 +02:00
Kruglov Pavel
1513285166
Fix typo 2022-07-18 20:54:13 +02:00
Kruglov Pavel
24c9467641
Fix 2022-07-18 19:55:14 +02:00
avogar
3f81aadb60 Fix schema inference in case of empty messages in Protobuf/CapnProto formats 2022-07-18 17:53:33 +00:00
avogar
2367f40b70 Better exception messages in schema inference 2022-07-18 15:36:33 +00:00
Kruglov Pavel
857290b586
Fix style 2022-07-18 15:40:28 +02:00
Kruglov Pavel
0f6044e50f
Fix style 2022-07-18 15:39:53 +02:00
avogar
9291d33080 Pass const std::string_view & by value, not by reference 2022-07-14 16:11:57 +00:00
Kruglov Pavel
b38241b08a
Merge branch 'master' into schema-inference-cache 2022-07-14 12:29:54 +02:00
avogar
2b7c6b7ecd Remove logging 2022-07-13 15:59:04 +00:00
avogar
7cde9d3b40 Add new features in schema inference 2022-07-13 15:57:55 +00:00
avogar
5b0fd31c64 Put column names in quotes 2022-06-30 16:14:30 +00:00
avogar
ee54c4f9b7 Add some fixes and add settings in docs 2022-06-30 12:41:56 +00:00
mergify[bot]
9482c99ab8
Merge branch 'master' into sql-insert-format 2022-06-29 11:03:07 +00:00
Robert Schulze
c22038d48b
More clang-tidy fixes 2022-06-28 11:50:05 +00:00
avogar
9bb68bc6de Add SQLInsert output format 2022-06-27 18:31:57 +00:00
avogar
b0c9d1a25d Fix style 2022-06-27 14:04:28 +00:00
avogar
5155262a16 Add some additional information to cache keys 2022-06-27 12:43:24 +00:00
avogar
d37ad2e6de Implement cache for schema inference for file/s3/hdfs/url 2022-06-21 13:02:48 +00:00
Alexey Milovidov
5e9e5a4eaf
Merge pull request #37525 from Avogar/avro-structs
Support Maps and Records, allow to insert null as default in Avro format
2022-06-15 00:04:29 +03:00
Robert Schulze
1a0b5f33b3
More consistent use of platform macros
cmake/target.cmake defines macros for the supported platforms, this
commit changes predefined system macros to our own macros.

__linux__ --> OS_LINUX
__APPLE__ --> OS_DARWIN
__FreeBSD__ --> OS_FREEBSD
2022-06-10 10:22:31 +02:00
Kruglov Pavel
6f17ba17ba
Revert "Revert "Fix possible segfault in schema inference"" 2022-06-02 13:28:27 +02:00
Alexander Tokmakov
4baae59252
Revert "Fix possible segfault in schema inference" 2022-06-02 14:04:28 +03:00
avogar
4abfd54dd6 Fix possible segfault in schema inference 2022-06-01 16:53:37 +00:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
Kruglov Pavel
0615866aea
Merge pull request #37450 from Avogar/check-format-on-storage-creation
Check format name on storage creation
2022-05-30 14:23:20 +02:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
avogar
4c9812d4c1 Allow to skip some of the first rows in CSV/TSV formats 2022-05-25 15:00:11 +00:00
avogar
038a422aeb Add setting to insert null as default 2022-05-25 12:56:59 +00:00
avogar
f782fa31c6 Merge branch 'master' of github.com:ClickHouse/ClickHouse into check-format-on-storage-creation 2022-05-25 08:42:54 +00:00
avogar
37b66c8a9e Check format name on storage creation 2022-05-23 12:48:48 +00:00
Kruglov Pavel
f539fb835d
Merge branch 'master' into formats-with-names 2022-05-23 12:14:20 +02:00
Kruglov Pavel
ce48e8e102
Merge pull request #36975 from Avogar/json-columns-formats
Add columnar JSON formats
2022-05-23 12:11:28 +02:00
avogar
a4cf07708c Fix comments 2022-05-20 14:57:27 +00:00
avogar
566d1b15fd Merge branch 'master' of github.com:ClickHouse/ClickHouse into formats-with-names 2022-05-20 13:54:52 +00:00
avogar
44726122bb Join JSON registration 2022-05-20 12:09:51 +00:00
avogar
a6a430c5ee Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-columns-formats 2022-05-20 11:08:30 +00:00
mergify[bot]
1ac4199e78
Merge branch 'master' into arrow-strings 2022-05-20 10:43:33 +00:00
avogar
cd6a29897e Apply input_format_max_rows_to_read_for_schema_inference for all files in globs in total 2022-05-18 17:56:36 +00:00
avogar
a0369fb9a6 Allow to use String type instead of Binary in Arrow/Parquet/ORC formats 2022-05-18 14:51:21 +00:00
Kruglov Pavel
134821eff8
Fix build 2022-05-18 12:44:20 +02:00
avogar
12010a81b7 Make better 2022-05-18 09:25:26 +00:00
Robert Schulze
e3cfec5b09
Merge remote-tracking branch 'origin/master' into clangtidies 2022-05-16 10:12:50 +02:00
avogar
68bb07d166 Better naming 2022-05-13 18:39:19 +00:00
avogar
cef13c2c02 Allow to skip unknown columns in Native format 2022-05-13 14:27:15 +00:00
avogar
b17fec659a Improve performance and memory usage for select of subset of columns for some formats 2022-05-13 13:51:28 +00:00
avogar
f6b16880bd Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-columns-formats 2022-05-10 12:57:18 +00:00
Anton Popov
e911900054 remove last mentions of data streams 2022-05-09 19:15:24 +00:00
avogar
04fdd75c56 Make JSONColumns frormats mono block by default 2022-05-09 11:13:44 +00:00
Robert Schulze
f2b1748c48
Enable clang-tidy bugprone-suspicious-semicolon
Official docs:

  Finds most instances of stray semicolons that unexpectedly alter the
  meaning of the code.
2022-05-08 19:13:37 +02:00
avogar
62a7ba3f26 Add columnar JSON formats 2022-05-06 16:48:48 +00:00
Kruglov Pavel
77e55c344c
Merge pull request #36667 from Avogar/mysqldump-format
Add MySQLDump input format
2022-05-04 19:49:48 +02:00
Dmitry Novik
5ba7a55c18
Merge pull request #36650 from bigo-sg/hive_text_parallel_parsing
Parallel parsing of hive text format
2022-05-03 15:56:28 +02:00
Kruglov Pavel
d613f7eab0
Merge branch 'master' into mysqldump-format 2022-05-02 13:31:57 +02:00
Antonio Andelic
a1a22b0007
Merge pull request #35149 from ContentSquare/nullables_with_proto3
Nullables with proto3 using Google wrappers
2022-05-02 09:49:37 +02:00
Robert Schulze
89aa9ae00f
Fixed clang-tidy check "bugprone-branch-clone"
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"

About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.

About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.

Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
2022-04-30 19:40:28 +02:00
Jakub Kuklis
a1f2dd6d34 Adding two settings in place of one, improvements to the test clarity 2022-04-29 10:01:51 +02:00
Jakub Kuklis
e73fa271a2 Minor improvements 2022-04-29 10:01:51 +02:00
Jakub Kuklis
5ca095c779 Pass the setting to buildFieldSerializer to fix undeclared 2022-04-29 10:01:51 +02:00
Jakub Kuklis
e705425374 Minor improvements 2022-04-29 10:01:51 +02:00
Jakub Kuklis
5c34585a00 Improve the test clarity 2022-04-29 10:01:51 +02:00
Jakub Kuklis
f19e473482 Remove local change 2022-04-29 10:01:51 +02:00
Jakub Kuklis
507ba1042c Adding a setting to enable Google wrappers special treatment 2022-04-29 10:01:51 +02:00
Jakub Kuklis
6d5c1e2fc0 Adding a setting to enable special treatment of google wrappers 2022-04-29 10:01:50 +02:00
Jakub Kuklis
b7a8acc302 Alternative design for output, mory messy, but the default value inside Google wrapper is not serialized 2022-04-29 10:01:50 +02:00
Jakub Kuklis
53e2454800 Corrected the behaviour for Proto Nullable output 2022-04-29 10:01:50 +02:00
Jakub Kuklis
10425c17b2 Write empty values for Google wrappers 2022-04-29 10:01:50 +02:00
Jakub Kuklis
ff49fad1f1 Another const keyword corrections for debug build 2022-04-29 10:01:50 +02:00
Jakub Kuklis
08ee7470f0 const keyword corrections for debug build 2022-04-29 10:01:50 +02:00
Jakub Kuklis
c0acc4dfa0 Fixing assert 2022-04-29 10:01:50 +02:00
Jakub Kuklis
7a78197746 Style corrections 2022-04-29 10:01:50 +02:00
Jakub Kuklis
ae1194bf9c Nullables detection in protobuf using Google wrappers 2022-04-29 10:01:50 +02:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
avogar
d295de1689 Fix comments and test 2022-04-28 14:59:35 +00:00
Kruglov Pavel
4d08587559
Merge branch 'master' into mysqldump-format 2022-04-28 15:58:18 +02:00
Vladimir C
1cbdc1ef3a
Merge pull request #36206 from vdimir/output-format-prometheus 2022-04-28 12:09:53 +02:00
vdimir
be0aa06958
Add output format Prometheus 2022-04-26 14:57:35 +00:00
avogar
b666b4e1c9 Fix possible heap-use-after-free in schema inference 2022-04-26 14:36:16 +00:00
avogar
33d845dade Add MySQLDump input format 2022-04-26 10:42:56 +00:00
taiyang-li
b7cc344d62 remove useless codes 2022-04-26 14:42:43 +08:00
taiyang-li
99dee35b6e parallel parsing of hive text format 2022-04-26 14:33:10 +08:00
Kruglov Pavel
34c342fdd3
Merge pull request #36205 from Avogar/improve-globs
Some refactoring around schema inference with globs
2022-04-25 13:14:46 +02:00
avogar
80eacc8533 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-json-schema-inference 2022-04-22 17:18:44 +00:00
avogar
a093181b4f Fix comments 2022-04-21 11:48:17 +00:00
Kruglov Pavel
813e228fcc
Merge branch 'master' into improve-globs 2022-04-20 16:31:47 +02:00
avogar
f31f019252 Fix 2022-04-19 19:25:41 +00:00
avogar
1f252cedfe Make better 2022-04-19 19:16:47 +00:00
Kruglov Pavel
ec4e1cb6d8
Merge pull request #36211 from Avogar/insert-select-all-formats
Allow insert select for files with formats without schema inference
2022-04-19 14:25:59 +02:00
avogar
ae88549c4f Allow insert select for files with formats without schema inference 2022-04-13 20:02:52 +00:00
avogar
8b60aeb7bc Improve schema inference for json objects 2022-04-13 19:13:40 +00:00
avogar
1c065f8c7a Some refactoring around schema inference with globs 2022-04-13 17:02:48 +00:00
avogar
348cae0d16 Fix possible segfault in schema inference for JSON formats 2022-04-13 12:34:40 +00:00
avogar
d2017a63b1 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference 2022-04-07 11:36:40 +00:00
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
taiyang-li
2ef316801c Merge branch 'master' into use_minmax_index 2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference 2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e suppoort skip splits in orc and parquet 2022-04-06 16:40:22 +08:00
Maksim Kita
e6c9a36ac7
Merge pull request #35733 from kitaisreal/ipv6-invalid-insert-test
Added test for insert of invalid IPv6 value
2022-04-04 12:28:16 +02:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order 2022-04-02 12:00:29 +00:00
avogar
ab2a963287 Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers 2022-03-31 14:09:43 +00:00
mergify[bot]
24ade25d61
Merge branch 'master' into improve-schema-inference 2022-03-31 13:42:47 +00:00
Kruglov Pavel
564a77c462
Fix build 2022-03-31 12:49:23 +02:00
Maksim Kita
371cdc956a Added input format settings for parsing invalid IPv4, IPv6 addresses as default values 2022-03-30 12:54:19 +02:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9 Improve schema inference for JSONEachRow and TSKV formats 2022-03-29 14:47:51 +00:00
Kruglov Pavel
a2fd09e031
Fix style 2022-03-29 16:34:07 +02:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
mergify[bot]
343588de2c
Merge branch 'master' into improve-schema-inference 2022-03-29 13:06:00 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
Anton Popov
a6450be8b6 fix schema inference 2022-03-26 01:33:10 +00:00
Anton Popov
67195bfdd5 support schema inference for type Object in format JSONEachRow 2022-03-25 21:51:53 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference 2022-03-25 12:05:40 +01:00
Kruglov Pavel
1823cac89d
Update src/Formats/EscapingRuleUtils.h
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:19:32 +01:00
Anton Popov
78100abc5f add parallel parsing and schema inference for type Object 2022-03-24 17:51:35 +00:00
avogar
abc020a502 Clean up 2022-03-24 13:08:58 +00:00
avogar
557edbd172 Add some improvements and fixes in schema inference 2022-03-24 12:54:12 +00:00
Antonio Andelic
0c23cd7b94 Add support for case insensitive column matching in arrow 2022-03-22 10:55:10 +00:00
Antonio Andelic
29d2bf7d1a Merge branch 'master' into case-insensitive-column-matching 2022-03-21 08:17:27 +00:00
Antonio Andelic
f75b054255 Allow case insensitive column matching 2022-03-21 07:47:37 +00:00
Kruglov Pavel
aa3c05e9d4
Merge pull request #35152 from rschu1ze/protobuf-batch-write
ProtobufList
2022-03-18 13:24:34 +01:00
Antonio Andelic
607f785e48 Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
This reverts commit ebf72bf61d, reversing
changes made to f1b812bdc1.
2022-03-17 12:31:43 +00:00
Robert Schulze
6e1d7a31bc
Fix build + typo 2022-03-17 11:41:20 +01:00
Anton Popov
2ced42ed41 add experimental settings for Object type 2022-03-16 16:51:23 +00:00
Anton Popov
0ba78c3c3a Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-16 15:28:09 +00:00
Robert Schulze
0d2ece6d91
Merge branch 'ClickHouse:master' into protobuf-batch-write 2022-03-16 09:43:33 +01:00
avogar
e2d1e643f2 Fix possible segfault in JSONEachRow schema inference 2022-03-15 11:44:15 +00:00
Robert Schulze
23122cb327
Fix review comments
ParquetBlockOutputFormat.cpp:
- undo unrelated formatting

ProtobufSerializer.cpp:
- undef debug tracing
- simplify logic in writeRow()

ProtobufSchemas.cpp:
- restore original search in cache by message type
2022-03-15 11:27:17 +01:00
Maksim Kita
538f8cbaad Fix clang-tidy warnings in Disks, Formats, Functions folders 2022-03-14 18:17:35 +00:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
Robert Schulze
514d4d2187
Implement ProtobufList - fixes ClickHouse#16436
Introduce IO format "ProtobufList" with protobuf schema

    // schemafile.proto
    message Envelope {
      message MessageType {
        uint32 colA = 1;
        string colB = 2;
      }
      repeated MessageType mt = 1;
    }

where "Envelope" is a hard-coded/expected top-level message and
"MessageType" is a message with user-provided name containing the table
fields to export/import, e.g.

    SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema =
    'schemafile:MessageType'

As a result, the new format wraps a list of messages (one per row) into
a single, containing message. Compare that to the schema of the existing
IO formats "Protobuf" and "ProtobufSingle":

    message MessageType {
      uint32 colA = 1;
      string colB = 2;
    }

The new format does not save space compared to the existing formats, but
it is conceptually a bit more beautiful and also more convenenient.

Implementation details:

- Created new files ProtobufList(Input|Output)Format which use the
  existing ProtobufSerializer mechanism. The goal was to reuse as much
  code as possible and avoid copypasta.

- I was torn between inheriting from I(Input|Output)Format vs.
  IRow(Input|Output)Format for ProtobufList(Input|Output)Format. The
  former is chunk-based which can be better for performance. Since the
  ProtobufSerializer mechanism is row-based but data is generally passed
  around in chunks, I decided for the latter to leverage the existing
  chunk <--> row mapping code in IRow(InputOutput)Format.

- A new ProtobufSerializer called ProtobufSerializerEnvelope was
  introduced (--> ProtobufSerializer.cpp). It represents the top-level
  message which encloses the list of inner nested messages, i.e. the
  rows.

- With the new format, parsing the schema file and matching the fields in
  the schema file to table column works like for the old formats. The only
  difference is that parsing starts one level below the "Envelope" (-->
  ProtobufSchema.cpp). This is more natural than forcing customers to
  have table columns start with "Envelope".

- Creation of the ProtobufSerializer tree also works like before. What
  is different is that we finally add a ProtobufSerializerEnvelope as
  new root of the tree. It's only purpose is to write/read the top-level
  message for the first/last row to write/read.

Caveats:

- The low-level serialization code in ProtobufWriter uses an internal
  buffer which is flushed to the output file only in endMessage().
  In the existing "Protobuf" format, this happens once per row, in the
  new format this happens only at the end of the serialization
  since row-level messages now call start/endNestedMessage(). As a
  future TODO to, the buffer should be flushed also in
  start/endNestedMessage() to reduce memory consumption.
2022-03-14 08:04:58 +01:00
zhanghuajie
53a8987b3b fix build fail with gcc --fix warnings without disabling some parameters 2022-03-11 21:59:19 +08:00
shuchaome
46cb4483a6 Optimise by lowering schema on the beginning. Add a functional test. 2022-03-11 14:34:46 +08:00
shuchaome
56795b831d add setting to lower column case when reading parquet/orc file 2022-03-09 16:07:02 +08:00
Anton Popov
0bc57da238 Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-07 14:46:08 +00:00
Azat Khuzhin
c426eef07d Fix generating USE_* for system.build_options
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-04 15:31:32 +03:00
Anton Popov
df3b07fe7c Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-03 22:25:28 +00:00
Maksim Kita
f1b1baf56e
Merge pull request #34982 from Cai-Yao/master
date_time_input_format = 'best_effort_us'
2022-03-03 09:22:57 +01:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
cwkyaoyao
72194bbaf3 Add date_time_input_format = best_effort_us 2022-03-02 16:00:06 +08:00
avogar
a7c6d11532 Fix schema inference for unquoted dates in CSV 2022-03-01 11:03:26 +00:00
Anton Popov
18940b8637 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-09 23:38:38 +03:00
Kruglov Pavel
15d85682e8
Fix style 2022-02-07 18:29:22 +03:00
avogar
a4c7ecde87 Make better 2022-02-07 17:51:26 +03:00
avogar
77b42bb9ff Support UUID in MsgPack format 2022-02-07 17:11:44 +03:00
Anton Popov
836a348a9c Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-01 15:23:07 +03:00
Maksim Kita
5ef83deaa6 Update sort to pdqsort 2022-01-30 19:49:48 +00:00
Anton Popov
78b9f15abb Merge remote-tracking branch 'upstream/master' into HEAD 2022-01-30 03:24:37 +03:00
Kruglov Pavel
7873b4475f
Merge branch 'master' into autodetect-format 2022-01-25 10:56:52 +03:00
avogar
a6740d2f9a Detect format and schema for stdin in clickhouse-local 2022-01-25 10:25:37 +03:00
avogar
1f49acc164 Better naming 2022-01-24 16:28:36 +03:00
Anton Popov
e8ce091e68 Merge remote-tracking branch 'upstream/master' into HEAD 2022-01-21 20:11:18 +03:00
Kruglov Pavel
7bfb1231b9
Merge branch 'master' into formats-with-suffixes 2022-01-20 14:47:17 +03:00
Azat Khuzhin
91e3ceeea9 Remove unbundled capnp support 2022-01-20 10:01:58 +03:00
Azat Khuzhin
a30ef87d65 Remove unbundled msgpack support 2022-01-20 10:01:58 +03:00
Azat Khuzhin
788cb6b2b0 Remove unbundled protobuf support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
1145e32af6 Remove unbundled snappy support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
ab8cdb198f Remove unbundled orc support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
d1b2bd5fbe Remove unbundled avro support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
b4ad324a88 Remove unbundled parquet/arrow support 2022-01-20 08:47:16 +03:00
Kruglov Pavel
a7df9cd53a
Merge branch 'master' into formats-with-suffixes 2022-01-14 21:03:49 +03:00
avogar
253035a5df Fix 2022-01-14 19:17:06 +03:00
Kruglov Pavel
d2e9f37bee
Merge branch 'master' into format-by-extention 2022-01-14 18:36:23 +03:00
avogar
89a181bd19 Make better 2022-01-14 18:16:18 +03:00
avogar
817a314263 Fix tests and style 2022-01-14 17:46:24 +03:00
Kruglov Pavel
5a908e8edd
Merge branch 'master' into formats-with-suffixes 2022-01-14 16:45:20 +03:00
Kseniia Sumarokova
5da673c3a5
Merge pull request #31104 from bigo-sg/hive_table
Implement hive table engine
2022-01-14 09:39:17 +03:00
avogar
2d7b1bfa5e Detect format in S3/HDFS/URL table engines 2022-01-13 16:14:18 +03:00
Kruglov Pavel
305d58a762
Merge pull request #33524 from Avogar/stacktrace-in-client
Don't print exception twice in client in case of exception in parallel parsing
2022-01-13 15:50:42 +03:00
avogar
8390e9ad60 Detect format by file name in file/hdfs/s3/url table functions 2022-01-12 18:29:31 +03:00
taiyang-li
66813a3aa9 merge master 2022-01-12 16:56:29 +08:00
avogar
0ae0aa712b Don't print exception twice in client in case of exception in parallel parsing 2022-01-11 18:37:07 +03:00
zhongyuankai
99279c1443 INTO OUTFILE / FROM INFILE: autodetect FORMAT by file extension 2022-01-11 21:26:14 +08:00
zhongyuankai
878e44eb97 auto format by file extension 2022-01-08 21:47:14 +08:00
taiyang-li
1e102bc1b2 merge master 2022-01-01 09:01:06 +08:00
alesapin
16c36d72b1
Merge pull request #33296 from ClickHouse/fix_clang_tidy_3
Fix clang tidy 3
2021-12-29 22:43:42 +03:00
avogar
97788b9c21 Allow to create new files on insert for File/S3/HDFS engines 2021-12-29 21:19:13 +03:00
Kruglov Pavel
489a30859f
Merge pull request #32455 from Avogar/schema-inference
Automatic schema inference for input formats
2021-12-29 21:03:48 +03:00
alesapin
34145c47da Fix clang tidy 2021-12-29 18:36:42 +03:00
avogar
78b522fd51 Fix fasttest build 2021-12-29 12:21:01 +03:00
avogar
d718a2e220 Clean up 2021-12-29 12:21:01 +03:00
avogar
26abf7aa62 Remove code duplication, use simdjson and rapidjson instead of Poco 2021-12-29 12:21:01 +03:00
avogar
aaf9f85c67 Add more tests and fixes 2021-12-29 12:18:56 +03:00
avogar
dd994aa761 Add some tests and some fixes 2021-12-29 12:18:56 +03:00
avogar
8112a71233 Implement schema inference for most input formats 2021-12-29 12:18:56 +03:00
taiyang-li
9036b18c2f merge master 2021-12-27 15:12:48 +08:00
Raúl Marín
cb22091b33 Merge remote-tracking branch 'blessed/master' into kill_scalar_github 2021-12-23 13:59:33 +01:00
Kruglov Pavel
a1455c0f2a
Merge pull request #32981 from Avogar/fix-csv-tuples
Fix tuple output in CSV format
2021-12-23 13:27:34 +03:00
Alexey Milovidov
df0e7d9ed3 Merge branch 'Issue81' of github.com:DevTeamBK/ClickHouse into merge-33050 2021-12-23 02:03:03 +03:00
Alexey Milovidov
b2d9e33882 Whitespaces 2021-12-23 02:02:36 +03:00
Raúl Marín
9bb88c26d8 Add existing progress to the record of the output format progress 2021-12-22 23:14:23 +01:00
Boris Kuschel
c62d9e2f2d Out of Bounds Column Index
Signed-off-by: Boris Kuschel <Boris.Kuschel@ibm.com>
2021-12-21 22:43:47 -05:00
taiyang-li
2597925724 merge master 2021-12-21 15:55:39 +08:00
avogar
ba6a513db0 Fix tuple output in CSV format 2021-12-20 19:27:09 +03:00
kreuzerkrieg
f06c37d206 Stop reading incomplete stripes and skip rows. 2021-12-19 18:41:32 +02:00
Anton Popov
99ebabd822 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-17 19:02:29 +03:00
alesapin
6bd7e425c6
Merge pull request #22535 from CurtizJ/sparse-serialization
Sparse serialization and ColumnSparse
2021-12-17 15:26:17 +03:00
taiyang-li
d033fc4c24 merge master and fix conflict 2021-12-17 15:11:21 +08:00
alesapin
884801e1bd Fixing 2021-12-14 19:08:08 +03:00
Anton Popov
16312e7e4a Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-14 18:58:17 +03:00
taiyang-li
ca3f7425a4 fix code 2021-12-14 17:37:31 +08:00
taiyang-li
8234d1176f merge master 2021-12-14 10:39:21 +08:00
Alexey Milovidov
71926a3a97 Fix surprisingly bad code in function "file" 2021-12-13 07:57:54 +03:00
李扬
8675086104
Merge branch 'master' into hive_table 2021-12-12 09:01:46 -06:00
Vitaly Baranov
463ce1fcee
Merge pull request #27822 from filimonov/kafka_protobuf_issue26643
Test for issue #26643
2021-12-11 20:31:22 +03:00
Vitaly Baranov
abe9dd3368
Merge pull request #32531 from vitlibar/fix-nested-array-sizes-for-missing-columns
Improve handling nested structures with missing columns while reading protobuf
2021-12-11 11:08:34 +03:00
Vitaly Baranov
b5b195f4e2
Merge branch 'master' into kafka_protobuf_issue26643 2021-12-10 23:22:35 +03:00
Vitaly Baranov
82c2d8dd2c Add synchronization to ProtobufSchemas. 2021-12-10 23:18:47 +03:00
Vitaly Baranov
73092942ea Take into account nested structures while filling missing columns while reading protobuf. 2021-12-10 21:11:06 +03:00
Anton Popov
d8367334a3 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-08 18:26:19 +03:00
Kseniia Sumarokova
926fd568c7
Merge pull request #32113 from FrankChen021/url_http_header
Set Content-Type in HTTP packets issued from URL engine
2021-12-07 08:52:36 +03:00
Kseniia Sumarokova
eab6f0ba49
Update FormatFactory.cpp 2021-12-06 23:35:29 +03:00
Kruglov Pavel
cc71c537bc
Merge pull request #32204 from Avogar/skip-quoted-values
Improve skiping unknown fields with Quoted escaping rule in Template/CustomSeparated formats
2021-12-06 12:28:14 +03:00
Vitaly Baranov
d709782088
Merge pull request #31988 from vitlibar/fix-skipping-columns-while-writing-protobuf
Fix skipping columns while writing protobuf
2021-12-05 18:01:11 +03:00
Vitaly Baranov
2e0b480044 Improve error handling while serializing protobufs. 2021-12-04 21:42:45 +03:00
Vitaly Baranov
15e3dbe3f2 Fix skipping columns in Nested while writing protobuf. 2021-12-04 18:00:02 +03:00
frank chen
c319b6fa32 Fix style
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 22:09:04 +08:00
avogar
7549619b25 Improve skiping unknown fields with Quoted escaping rule in Template/CustomSeparated formats 2021-12-03 16:25:35 +03:00
frank chen
898db5b468 Resolve review comments
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 19:47:05 +08:00
Anton Popov
f6be3d16fd
Merge pull request #24820 from kssenii/versioning
Versioning of aggregate function states
2021-12-03 01:41:44 +03:00
taiyang-li
9ec8272186 refactor hive text input format 2021-12-02 16:14:25 +08:00
Anton Popov
6f4d9a53b2 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-12-01 15:54:33 +03:00
Anton Popov
54f51444c0 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-01 15:49:02 +03:00
taiyang-li
4aeadf3967 fix build error 2021-12-01 14:13:48 +08:00
kssenii
71bfc72e37 Fix 2021-11-30 14:42:37 +00:00
taiyang-li
d213500a3e remove blank at end of line 2021-11-30 18:23:24 +08:00
mergify[bot]
8d5460b469
Merge branch 'master' into feature-support-bool-type 2021-11-29 11:50:18 +00:00
kssenii
be3b4ca8fe Merge branch 'master' of github.com:ClickHouse/ClickHouse into versioning 2021-11-27 09:44:31 +00:00
kssenii
515261f5dd Better 2021-11-27 09:40:46 +00:00
taiyang-li
72f60cceb9
Merge branch 'master' into hive_table 2021-11-25 17:33:26 +08:00
Kseniia Sumarokova
93cf66df12
Merge pull request #30936 from kssenii/seekable-read-buffers
Reduce memory usage for some formats when reading with s3/url/hdfs
2021-11-25 11:19:24 +03:00
lgbo
996d7125c0
Merge branch 'master' into hive_table 2021-11-23 10:19:02 +08:00
Anton Popov
ccd78e3838 Merge remote-tracking branch 'upstream/master' into HEAD 2021-11-22 17:19:35 +03:00
Kruglov Pavel
cded91b013
Update verbosePrintString.h 2021-11-19 16:51:49 +03:00
taiyang-li
e8644807fe merge master and solve conflict 2021-11-19 15:01:58 +08:00
MaxWk
f17d5b02e4 use bool representation 2021-11-19 14:30:22 +08:00
kssenii
1a9817f872 Correct merge 2021-11-18 07:56:10 +00:00
avogar
1ebcbf4748 Fix style 2021-11-16 17:10:30 +03:00
avogar
8e9783388b Add formats CustomSeparatedWithNames/WithNamesAndTypes 2021-11-16 17:10:30 +03:00
avogar
73d1918410 tmp 2021-11-16 17:10:30 +03:00
kssenii
37f482d478 Merge branch 'master' of github.com:ClickHouse/ClickHouse into versioning 2021-11-15 07:31:11 +00:00
kssenii
f18dcd2287 Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers 2021-11-13 14:38:57 +03:00
cgp
18504f545a move InputCreatorFunc to InputCreator 2021-11-12 00:34:59 +08:00
MaxWk
d42a454837 support some bool format 2021-11-11 16:01:32 +08:00
taiyang-li
deef4d4dbe add options read_bool_as_uint8 when parse csv 2021-11-11 11:49:54 +08:00
Anton Popov
a20922b2d3 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-11-09 15:36:25 +03:00
Anton Popov
66973a2a28 Merge remote-tracking branch 'upstream/master' into HEAD 2021-11-08 21:27:45 +03:00
taiyang-li
36ca0b296b implement hive table engine 2021-11-05 19:55:30 +08:00
Anton Popov
84e914e05a minor fixes near serializations 2021-11-05 01:46:00 +03:00
kssenii
ec11179f91 Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers 2021-11-03 14:33:31 +03:00
kssenii
45ea820297 Reduce memory usage for some formats 2021-11-03 14:30:03 +03:00
Kruglov Pavel
327a34e9da
Merge pull request #30497 from Avogar/null-deserialization
Add custom null representation support for TSV/CSV input formats, fix Nullable(String) deserializing in some formats
2021-11-03 11:30:25 +03:00
avogar
42ab57f0e5 Set output_format_avro_rows_in_file default to 1 2021-11-02 14:06:10 +03:00
Kruglov Pavel
901ebcede6
Merge pull request #30351 from arenadata/ADQM-335
output_format_avro_rows_in_file
2021-11-02 12:25:27 +03:00
Kruglov Pavel
1f8535c02b
Merge branch 'master' into null-deserialization 2021-11-02 12:15:21 +03:00
Anton Popov
1628f50e51
Merge branch 'master' into sparse-serialization 2021-11-02 06:26:18 +03:00
Kruglov Pavel
9a1275cb10
Merge pull request #30178 from Avogar/tsv-csv
Refactor and improve TSV, CSV, JSONCompactEachRow, RowBinary formats. Fix bugs in formats
2021-11-02 00:38:30 +03:00
Anton Popov
d50137013c Merge remote-tracking branch 'upstream/master' into HEAD 2021-11-01 16:55:53 +03:00
Vitaly Baranov
d29b73e301
Merge pull request #30689 from vitlibar/refactor-log-family
Refactoring of Log family
2021-10-31 18:50:08 +03:00
Vitaly Baranov
0e8c9b089f Keep indices for StorageStripeLog in memory. 2021-10-31 03:52:41 +03:00
Anton Popov
0099dfd523 refactoring of SerializationInfo 2021-10-29 20:21:02 +03:00
Kruglov Pavel
7d4f211d5b
Merge branch 'master' into tsv-csv 2021-10-29 16:38:06 +03:00
Alexey Milovidov
8b4a6a2416 Remove cruft 2021-10-28 02:10:39 +03:00
avogar
d1ef96a5ef Add test, avoid unnecessary allocations, use PeekableReadBuffer only in corner case 2021-10-27 17:29:15 +03:00
avogar
d5c5a3213b Add custom null representation support for TSV/CSV input formats, fix bugs in deserializing NULLs in some cases 2021-10-21 16:52:27 +03:00
Ilya Golshtein
551a1065c1 output_format_avro_rows_in_file default is 1000000 2021-10-21 14:19:25 +03:00
Ilya Golshtein
82f33151e7 output_format_avro_rows_in_file fixes per code review 2021-10-21 02:53:39 +03:00
avogar
872cca550a Make better 2021-10-20 15:47:20 +03:00
mergify[bot]
0a4360c43e
Merge branch 'master' into tsv-csv 2021-10-20 11:57:06 +00:00
avogar
7007286088 Fix WithNamesAndTypes parallel parsing, add new tests, small refactoring 2021-10-20 14:48:54 +03:00
Nikolai Kochetov
a92dc0a826 Update obsolete comments. 2021-10-19 12:58:10 +03:00
Kruglov Pavel
5052ec3ab0
Merge branch 'master' into tsv-csv 2021-10-19 12:03:52 +03:00
Kruglov Pavel
1e2ceeb2e7
Merge pull request #29291 from Avogar/capnproto
Add CapnProto output format, refactor CapnProto input format
2021-10-19 11:54:55 +03:00
Ilya Golshtein
d90302aa3b output_format_avro_rows_in_file 2021-10-18 19:01:06 +03:00
Anton Popov
d71ffc355a Merge remote-tracking branch 'upstream/master' into HEAD 2021-10-18 15:18:22 +03:00
Kruglov Pavel
dbc2f3408e
Merge branch 'master' into tsv-csv 2021-10-18 14:38:22 +03:00
Kruglov Pavel
6350957709
Fix special build 2021-10-18 14:30:02 +03:00
Nikolai Kochetov
bfcbf5abe0 Merge branch 'master' into removing-data-streams-folder 2021-10-17 10:42:37 +03:00
Nikolai Kochetov
a08c98d760 Move some files. 2021-10-16 17:03:50 +03:00
Azat Khuzhin
50231460af Use forward declaration for Buffer<> in generic headers
- changes in ReadHelpers.h -- recompiles 1000 modules
- changes in FormatFactor.h -- recompiles 100 modules
2021-10-16 12:03:24 +03:00
Nikolai Kochetov
fd14faeae2 Remove DataStreams folder. 2021-10-15 23:18:20 +03:00
avogar
2da8180613 Add space after comma 2021-10-14 21:39:09 +03:00
avogar
8729201208 Remove redundant move 2021-10-14 21:36:57 +03:00
avogar
89c1a04ef4 Fix comments 2021-10-14 21:35:56 +03:00
Kruglov Pavel
9ec6930c15 Better exception handling 2021-10-14 16:43:23 +03:00
Kruglov Pavel
95790b8a1c Update CapnProtoUtils.cpp 2021-10-14 16:43:23 +03:00
Kruglov Pavel
9ddcdbba39 Add INCORRECT_DATA error code 2021-10-14 16:43:23 +03:00
avogar
f88a2ad653 Handle exception when cannot extract value from struct, add test for it 2021-10-14 16:43:23 +03:00
avogar
ed8818a773 Fix style, better check in enum comparison 2021-10-14 16:43:22 +03:00
Kruglov Pavel
1cd938fbba Fix typo 2021-10-14 16:43:22 +03:00
avogar
ce22f534c4 Add CapnProto output format, refactor CapnProto input format 2021-10-14 16:43:22 +03:00
avogar
324dfd4f81 Refactor and improve TSV, CSV and JSONCompactEachRow formats, fix some bugs in formats 2021-10-14 13:32:49 +03:00
Nikolai Kochetov
ab28c6c855 Remove BlockInputStream interfaces. 2021-10-14 13:25:43 +03:00
Nikolai Kochetov
2957971ee3 Remove some last streams. 2021-10-13 21:22:02 +03:00
Nikolai Kochetov
a5fa5c7ea3 Move formats to Impl 2021-10-13 13:01:08 +03:00
Nikolai Kochetov
88b1807434 Fix special build. 2021-10-12 10:33:45 +03:00
Nikolai Kochetov
1e1d5d7fea Fix style. 2021-10-11 22:21:04 +03:00
Nikolai Kochetov
ec18340351 Remove streams from formats. 2021-10-11 19:11:50 +03:00
Nikolai Kochetov
1f6d5482b1 Fix some tests. 2021-10-08 21:33:51 +03:00
Nikolai Kochetov
c6bce1a4cf Update Native. 2021-10-08 20:21:19 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Alexey Milovidov
cd7f9d981c Remove ya.make 2021-09-25 04:22:54 +03:00
Anton Popov
ea4fd19e28
Merge pull request #29087 from CurtizJ/asyn-inserts-follow-up
Minor enhancements in async inserts
2021-09-21 13:38:52 +03:00
PHO
3c4b1ea9c5 New setting: output_format_csv_null_representation
This is the same as output_format_tsv_null_representation but is for CSV output.
2021-09-17 17:58:23 +09:00
Anton Popov
99175f7acc minor enhancements in async inserts 2021-09-16 20:55:34 +03:00
Vitaly Stoyan
9bbdd39efc initial commit 2021-09-15 18:07:18 +03:00
Anton Popov
ee7c0d4cc1 dynamic columns: fix several cases of parsing json 2021-09-10 00:18:02 +03:00
Anton Popov
4c388e3d84 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-09-09 14:10:16 +03:00
Nikita Mikhaylov
fb66ab75be
Merge pull request #25633 from Avogar/json-as-string
Allow data in square brackets in JSONAsString format
2021-08-30 14:06:28 +03:00
Dmitrii Kovalkov
9871ad70ff Exclude fuzzers 2021-08-30 11:12:25 +03:00
mergify[bot]
401b2f3b8f
Merge branch 'master' into json-as-string 2021-08-26 15:03:59 +00:00
Nikolai Kochetov
5842d3573d Fix throw without exception in MySQL source. 2021-08-23 15:49:41 +03:00
Anton Popov
61239343e3 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-08-20 16:33:30 +03:00
Alexey Milovidov
8adaef7c8e Make text format for Decimal tuneable 2021-08-16 11:03:23 +03:00
Nikolai Kochetov
ad00aaa18c
Merge pull request #27575 from kitaisreal/removed-some-data-streams
Removed some data streams
2021-08-13 12:59:00 +03:00
alexey-milovidov
36ab47769b
Merge pull request #27609 from Algunenano/refactor_mysql_format
Refactor mysql format check
2021-08-13 03:02:49 +03:00
mergify[bot]
38d97ec52a
Merge branch 'master' into json-as-string 2021-08-12 17:18:38 +00:00