Commit Graph

791 Commits

Author SHA1 Message Date
Jakub Kuklis
10425c17b2 Write empty values for Google wrappers 2022-04-29 10:01:50 +02:00
Jakub Kuklis
ff49fad1f1 Another const keyword corrections for debug build 2022-04-29 10:01:50 +02:00
Jakub Kuklis
08ee7470f0 const keyword corrections for debug build 2022-04-29 10:01:50 +02:00
Jakub Kuklis
c0acc4dfa0 Fixing assert 2022-04-29 10:01:50 +02:00
Jakub Kuklis
7a78197746 Style corrections 2022-04-29 10:01:50 +02:00
Jakub Kuklis
ae1194bf9c Nullables detection in protobuf using Google wrappers 2022-04-29 10:01:50 +02:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
avogar
d295de1689 Fix comments and test 2022-04-28 14:59:35 +00:00
Kruglov Pavel
4d08587559
Merge branch 'master' into mysqldump-format 2022-04-28 15:58:18 +02:00
Vladimir C
1cbdc1ef3a
Merge pull request #36206 from vdimir/output-format-prometheus 2022-04-28 12:09:53 +02:00
vdimir
be0aa06958
Add output format Prometheus 2022-04-26 14:57:35 +00:00
avogar
b666b4e1c9 Fix possible heap-use-after-free in schema inference 2022-04-26 14:36:16 +00:00
avogar
33d845dade Add MySQLDump input format 2022-04-26 10:42:56 +00:00
taiyang-li
b7cc344d62 remove useless codes 2022-04-26 14:42:43 +08:00
taiyang-li
99dee35b6e parallel parsing of hive text format 2022-04-26 14:33:10 +08:00
Kruglov Pavel
34c342fdd3
Merge pull request #36205 from Avogar/improve-globs
Some refactoring around schema inference with globs
2022-04-25 13:14:46 +02:00
avogar
80eacc8533 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-json-schema-inference 2022-04-22 17:18:44 +00:00
avogar
a093181b4f Fix comments 2022-04-21 11:48:17 +00:00
Kruglov Pavel
813e228fcc
Merge branch 'master' into improve-globs 2022-04-20 16:31:47 +02:00
avogar
f31f019252 Fix 2022-04-19 19:25:41 +00:00
avogar
1f252cedfe Make better 2022-04-19 19:16:47 +00:00
Kruglov Pavel
ec4e1cb6d8
Merge pull request #36211 from Avogar/insert-select-all-formats
Allow insert select for files with formats without schema inference
2022-04-19 14:25:59 +02:00
avogar
ae88549c4f Allow insert select for files with formats without schema inference 2022-04-13 20:02:52 +00:00
avogar
8b60aeb7bc Improve schema inference for json objects 2022-04-13 19:13:40 +00:00
avogar
1c065f8c7a Some refactoring around schema inference with globs 2022-04-13 17:02:48 +00:00
avogar
348cae0d16 Fix possible segfault in schema inference for JSON formats 2022-04-13 12:34:40 +00:00
avogar
d2017a63b1 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference 2022-04-07 11:36:40 +00:00
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
taiyang-li
2ef316801c Merge branch 'master' into use_minmax_index 2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference 2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e suppoort skip splits in orc and parquet 2022-04-06 16:40:22 +08:00
Maksim Kita
e6c9a36ac7
Merge pull request #35733 from kitaisreal/ipv6-invalid-insert-test
Added test for insert of invalid IPv6 value
2022-04-04 12:28:16 +02:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order 2022-04-02 12:00:29 +00:00
avogar
ab2a963287 Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers 2022-03-31 14:09:43 +00:00
mergify[bot]
24ade25d61
Merge branch 'master' into improve-schema-inference 2022-03-31 13:42:47 +00:00
Kruglov Pavel
564a77c462
Fix build 2022-03-31 12:49:23 +02:00
Maksim Kita
371cdc956a Added input format settings for parsing invalid IPv4, IPv6 addresses as default values 2022-03-30 12:54:19 +02:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9 Improve schema inference for JSONEachRow and TSKV formats 2022-03-29 14:47:51 +00:00
Kruglov Pavel
a2fd09e031
Fix style 2022-03-29 16:34:07 +02:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
mergify[bot]
343588de2c
Merge branch 'master' into improve-schema-inference 2022-03-29 13:06:00 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
Anton Popov
a6450be8b6 fix schema inference 2022-03-26 01:33:10 +00:00
Anton Popov
67195bfdd5 support schema inference for type Object in format JSONEachRow 2022-03-25 21:51:53 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference 2022-03-25 12:05:40 +01:00
Kruglov Pavel
1823cac89d
Update src/Formats/EscapingRuleUtils.h
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:19:32 +01:00
Anton Popov
78100abc5f add parallel parsing and schema inference for type Object 2022-03-24 17:51:35 +00:00
avogar
abc020a502 Clean up 2022-03-24 13:08:58 +00:00
avogar
557edbd172 Add some improvements and fixes in schema inference 2022-03-24 12:54:12 +00:00
Antonio Andelic
0c23cd7b94 Add support for case insensitive column matching in arrow 2022-03-22 10:55:10 +00:00
Antonio Andelic
29d2bf7d1a Merge branch 'master' into case-insensitive-column-matching 2022-03-21 08:17:27 +00:00
Antonio Andelic
f75b054255 Allow case insensitive column matching 2022-03-21 07:47:37 +00:00
Kruglov Pavel
aa3c05e9d4
Merge pull request #35152 from rschu1ze/protobuf-batch-write
ProtobufList
2022-03-18 13:24:34 +01:00
Antonio Andelic
607f785e48 Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
This reverts commit ebf72bf61d, reversing
changes made to f1b812bdc1.
2022-03-17 12:31:43 +00:00
Robert Schulze
6e1d7a31bc
Fix build + typo 2022-03-17 11:41:20 +01:00
Anton Popov
2ced42ed41 add experimental settings for Object type 2022-03-16 16:51:23 +00:00
Anton Popov
0ba78c3c3a Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-16 15:28:09 +00:00
Robert Schulze
0d2ece6d91
Merge branch 'ClickHouse:master' into protobuf-batch-write 2022-03-16 09:43:33 +01:00
avogar
e2d1e643f2 Fix possible segfault in JSONEachRow schema inference 2022-03-15 11:44:15 +00:00
Robert Schulze
23122cb327
Fix review comments
ParquetBlockOutputFormat.cpp:
- undo unrelated formatting

ProtobufSerializer.cpp:
- undef debug tracing
- simplify logic in writeRow()

ProtobufSchemas.cpp:
- restore original search in cache by message type
2022-03-15 11:27:17 +01:00
Maksim Kita
538f8cbaad Fix clang-tidy warnings in Disks, Formats, Functions folders 2022-03-14 18:17:35 +00:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
Robert Schulze
514d4d2187
Implement ProtobufList - fixes ClickHouse#16436
Introduce IO format "ProtobufList" with protobuf schema

    // schemafile.proto
    message Envelope {
      message MessageType {
        uint32 colA = 1;
        string colB = 2;
      }
      repeated MessageType mt = 1;
    }

where "Envelope" is a hard-coded/expected top-level message and
"MessageType" is a message with user-provided name containing the table
fields to export/import, e.g.

    SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema =
    'schemafile:MessageType'

As a result, the new format wraps a list of messages (one per row) into
a single, containing message. Compare that to the schema of the existing
IO formats "Protobuf" and "ProtobufSingle":

    message MessageType {
      uint32 colA = 1;
      string colB = 2;
    }

The new format does not save space compared to the existing formats, but
it is conceptually a bit more beautiful and also more convenenient.

Implementation details:

- Created new files ProtobufList(Input|Output)Format which use the
  existing ProtobufSerializer mechanism. The goal was to reuse as much
  code as possible and avoid copypasta.

- I was torn between inheriting from I(Input|Output)Format vs.
  IRow(Input|Output)Format for ProtobufList(Input|Output)Format. The
  former is chunk-based which can be better for performance. Since the
  ProtobufSerializer mechanism is row-based but data is generally passed
  around in chunks, I decided for the latter to leverage the existing
  chunk <--> row mapping code in IRow(InputOutput)Format.

- A new ProtobufSerializer called ProtobufSerializerEnvelope was
  introduced (--> ProtobufSerializer.cpp). It represents the top-level
  message which encloses the list of inner nested messages, i.e. the
  rows.

- With the new format, parsing the schema file and matching the fields in
  the schema file to table column works like for the old formats. The only
  difference is that parsing starts one level below the "Envelope" (-->
  ProtobufSchema.cpp). This is more natural than forcing customers to
  have table columns start with "Envelope".

- Creation of the ProtobufSerializer tree also works like before. What
  is different is that we finally add a ProtobufSerializerEnvelope as
  new root of the tree. It's only purpose is to write/read the top-level
  message for the first/last row to write/read.

Caveats:

- The low-level serialization code in ProtobufWriter uses an internal
  buffer which is flushed to the output file only in endMessage().
  In the existing "Protobuf" format, this happens once per row, in the
  new format this happens only at the end of the serialization
  since row-level messages now call start/endNestedMessage(). As a
  future TODO to, the buffer should be flushed also in
  start/endNestedMessage() to reduce memory consumption.
2022-03-14 08:04:58 +01:00
zhanghuajie
53a8987b3b fix build fail with gcc --fix warnings without disabling some parameters 2022-03-11 21:59:19 +08:00
shuchaome
46cb4483a6 Optimise by lowering schema on the beginning. Add a functional test. 2022-03-11 14:34:46 +08:00
shuchaome
56795b831d add setting to lower column case when reading parquet/orc file 2022-03-09 16:07:02 +08:00
Anton Popov
0bc57da238 Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-07 14:46:08 +00:00
Azat Khuzhin
c426eef07d Fix generating USE_* for system.build_options
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-04 15:31:32 +03:00
Anton Popov
df3b07fe7c Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-03 22:25:28 +00:00
Maksim Kita
f1b1baf56e
Merge pull request #34982 from Cai-Yao/master
date_time_input_format = 'best_effort_us'
2022-03-03 09:22:57 +01:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
cwkyaoyao
72194bbaf3 Add date_time_input_format = best_effort_us 2022-03-02 16:00:06 +08:00
avogar
a7c6d11532 Fix schema inference for unquoted dates in CSV 2022-03-01 11:03:26 +00:00
Anton Popov
18940b8637 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-09 23:38:38 +03:00
Kruglov Pavel
15d85682e8
Fix style 2022-02-07 18:29:22 +03:00
avogar
a4c7ecde87 Make better 2022-02-07 17:51:26 +03:00
avogar
77b42bb9ff Support UUID in MsgPack format 2022-02-07 17:11:44 +03:00
Anton Popov
836a348a9c Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-01 15:23:07 +03:00
Maksim Kita
5ef83deaa6 Update sort to pdqsort 2022-01-30 19:49:48 +00:00
Anton Popov
78b9f15abb Merge remote-tracking branch 'upstream/master' into HEAD 2022-01-30 03:24:37 +03:00
Kruglov Pavel
7873b4475f
Merge branch 'master' into autodetect-format 2022-01-25 10:56:52 +03:00
avogar
a6740d2f9a Detect format and schema for stdin in clickhouse-local 2022-01-25 10:25:37 +03:00
avogar
1f49acc164 Better naming 2022-01-24 16:28:36 +03:00
Anton Popov
e8ce091e68 Merge remote-tracking branch 'upstream/master' into HEAD 2022-01-21 20:11:18 +03:00
Kruglov Pavel
7bfb1231b9
Merge branch 'master' into formats-with-suffixes 2022-01-20 14:47:17 +03:00
Azat Khuzhin
91e3ceeea9 Remove unbundled capnp support 2022-01-20 10:01:58 +03:00
Azat Khuzhin
a30ef87d65 Remove unbundled msgpack support 2022-01-20 10:01:58 +03:00
Azat Khuzhin
788cb6b2b0 Remove unbundled protobuf support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
1145e32af6 Remove unbundled snappy support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
ab8cdb198f Remove unbundled orc support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
d1b2bd5fbe Remove unbundled avro support 2022-01-20 08:47:16 +03:00
Azat Khuzhin
b4ad324a88 Remove unbundled parquet/arrow support 2022-01-20 08:47:16 +03:00
Kruglov Pavel
a7df9cd53a
Merge branch 'master' into formats-with-suffixes 2022-01-14 21:03:49 +03:00
avogar
253035a5df Fix 2022-01-14 19:17:06 +03:00
Kruglov Pavel
d2e9f37bee
Merge branch 'master' into format-by-extention 2022-01-14 18:36:23 +03:00
avogar
89a181bd19 Make better 2022-01-14 18:16:18 +03:00
avogar
817a314263 Fix tests and style 2022-01-14 17:46:24 +03:00
Kruglov Pavel
5a908e8edd
Merge branch 'master' into formats-with-suffixes 2022-01-14 16:45:20 +03:00
Kseniia Sumarokova
5da673c3a5
Merge pull request #31104 from bigo-sg/hive_table
Implement hive table engine
2022-01-14 09:39:17 +03:00
avogar
2d7b1bfa5e Detect format in S3/HDFS/URL table engines 2022-01-13 16:14:18 +03:00
Kruglov Pavel
305d58a762
Merge pull request #33524 from Avogar/stacktrace-in-client
Don't print exception twice in client in case of exception in parallel parsing
2022-01-13 15:50:42 +03:00
avogar
8390e9ad60 Detect format by file name in file/hdfs/s3/url table functions 2022-01-12 18:29:31 +03:00
taiyang-li
66813a3aa9 merge master 2022-01-12 16:56:29 +08:00
avogar
0ae0aa712b Don't print exception twice in client in case of exception in parallel parsing 2022-01-11 18:37:07 +03:00
zhongyuankai
99279c1443 INTO OUTFILE / FROM INFILE: autodetect FORMAT by file extension 2022-01-11 21:26:14 +08:00
zhongyuankai
878e44eb97 auto format by file extension 2022-01-08 21:47:14 +08:00
taiyang-li
1e102bc1b2 merge master 2022-01-01 09:01:06 +08:00
alesapin
16c36d72b1
Merge pull request #33296 from ClickHouse/fix_clang_tidy_3
Fix clang tidy 3
2021-12-29 22:43:42 +03:00
avogar
97788b9c21 Allow to create new files on insert for File/S3/HDFS engines 2021-12-29 21:19:13 +03:00
Kruglov Pavel
489a30859f
Merge pull request #32455 from Avogar/schema-inference
Automatic schema inference for input formats
2021-12-29 21:03:48 +03:00
alesapin
34145c47da Fix clang tidy 2021-12-29 18:36:42 +03:00
avogar
78b522fd51 Fix fasttest build 2021-12-29 12:21:01 +03:00
avogar
d718a2e220 Clean up 2021-12-29 12:21:01 +03:00
avogar
26abf7aa62 Remove code duplication, use simdjson and rapidjson instead of Poco 2021-12-29 12:21:01 +03:00
avogar
aaf9f85c67 Add more tests and fixes 2021-12-29 12:18:56 +03:00
avogar
dd994aa761 Add some tests and some fixes 2021-12-29 12:18:56 +03:00
avogar
8112a71233 Implement schema inference for most input formats 2021-12-29 12:18:56 +03:00
taiyang-li
9036b18c2f merge master 2021-12-27 15:12:48 +08:00
Raúl Marín
cb22091b33 Merge remote-tracking branch 'blessed/master' into kill_scalar_github 2021-12-23 13:59:33 +01:00
Kruglov Pavel
a1455c0f2a
Merge pull request #32981 from Avogar/fix-csv-tuples
Fix tuple output in CSV format
2021-12-23 13:27:34 +03:00
Alexey Milovidov
df0e7d9ed3 Merge branch 'Issue81' of github.com:DevTeamBK/ClickHouse into merge-33050 2021-12-23 02:03:03 +03:00
Alexey Milovidov
b2d9e33882 Whitespaces 2021-12-23 02:02:36 +03:00
Raúl Marín
9bb88c26d8 Add existing progress to the record of the output format progress 2021-12-22 23:14:23 +01:00
Boris Kuschel
c62d9e2f2d Out of Bounds Column Index
Signed-off-by: Boris Kuschel <Boris.Kuschel@ibm.com>
2021-12-21 22:43:47 -05:00
taiyang-li
2597925724 merge master 2021-12-21 15:55:39 +08:00
avogar
ba6a513db0 Fix tuple output in CSV format 2021-12-20 19:27:09 +03:00
kreuzerkrieg
f06c37d206 Stop reading incomplete stripes and skip rows. 2021-12-19 18:41:32 +02:00
Anton Popov
99ebabd822 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-17 19:02:29 +03:00
alesapin
6bd7e425c6
Merge pull request #22535 from CurtizJ/sparse-serialization
Sparse serialization and ColumnSparse
2021-12-17 15:26:17 +03:00
taiyang-li
d033fc4c24 merge master and fix conflict 2021-12-17 15:11:21 +08:00
alesapin
884801e1bd Fixing 2021-12-14 19:08:08 +03:00
Anton Popov
16312e7e4a Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-14 18:58:17 +03:00
taiyang-li
ca3f7425a4 fix code 2021-12-14 17:37:31 +08:00
taiyang-li
8234d1176f merge master 2021-12-14 10:39:21 +08:00
Alexey Milovidov
71926a3a97 Fix surprisingly bad code in function "file" 2021-12-13 07:57:54 +03:00
李扬
8675086104
Merge branch 'master' into hive_table 2021-12-12 09:01:46 -06:00
Vitaly Baranov
463ce1fcee
Merge pull request #27822 from filimonov/kafka_protobuf_issue26643
Test for issue #26643
2021-12-11 20:31:22 +03:00
Vitaly Baranov
abe9dd3368
Merge pull request #32531 from vitlibar/fix-nested-array-sizes-for-missing-columns
Improve handling nested structures with missing columns while reading protobuf
2021-12-11 11:08:34 +03:00
Vitaly Baranov
b5b195f4e2
Merge branch 'master' into kafka_protobuf_issue26643 2021-12-10 23:22:35 +03:00
Vitaly Baranov
82c2d8dd2c Add synchronization to ProtobufSchemas. 2021-12-10 23:18:47 +03:00
Vitaly Baranov
73092942ea Take into account nested structures while filling missing columns while reading protobuf. 2021-12-10 21:11:06 +03:00
Anton Popov
d8367334a3 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-08 18:26:19 +03:00
Kseniia Sumarokova
926fd568c7
Merge pull request #32113 from FrankChen021/url_http_header
Set Content-Type in HTTP packets issued from URL engine
2021-12-07 08:52:36 +03:00
Kseniia Sumarokova
eab6f0ba49
Update FormatFactory.cpp 2021-12-06 23:35:29 +03:00
Kruglov Pavel
cc71c537bc
Merge pull request #32204 from Avogar/skip-quoted-values
Improve skiping unknown fields with Quoted escaping rule in Template/CustomSeparated formats
2021-12-06 12:28:14 +03:00
Vitaly Baranov
d709782088
Merge pull request #31988 from vitlibar/fix-skipping-columns-while-writing-protobuf
Fix skipping columns while writing protobuf
2021-12-05 18:01:11 +03:00
Vitaly Baranov
2e0b480044 Improve error handling while serializing protobufs. 2021-12-04 21:42:45 +03:00
Vitaly Baranov
15e3dbe3f2 Fix skipping columns in Nested while writing protobuf. 2021-12-04 18:00:02 +03:00
frank chen
c319b6fa32 Fix style
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 22:09:04 +08:00
avogar
7549619b25 Improve skiping unknown fields with Quoted escaping rule in Template/CustomSeparated formats 2021-12-03 16:25:35 +03:00
frank chen
898db5b468 Resolve review comments
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-12-03 19:47:05 +08:00
Anton Popov
f6be3d16fd
Merge pull request #24820 from kssenii/versioning
Versioning of aggregate function states
2021-12-03 01:41:44 +03:00
taiyang-li
9ec8272186 refactor hive text input format 2021-12-02 16:14:25 +08:00
Anton Popov
6f4d9a53b2 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-12-01 15:54:33 +03:00
Anton Popov
54f51444c0 Merge remote-tracking branch 'upstream/master' into HEAD 2021-12-01 15:49:02 +03:00
taiyang-li
4aeadf3967 fix build error 2021-12-01 14:13:48 +08:00
kssenii
71bfc72e37 Fix 2021-11-30 14:42:37 +00:00
taiyang-li
d213500a3e remove blank at end of line 2021-11-30 18:23:24 +08:00
mergify[bot]
8d5460b469
Merge branch 'master' into feature-support-bool-type 2021-11-29 11:50:18 +00:00
kssenii
be3b4ca8fe Merge branch 'master' of github.com:ClickHouse/ClickHouse into versioning 2021-11-27 09:44:31 +00:00
kssenii
515261f5dd Better 2021-11-27 09:40:46 +00:00
taiyang-li
72f60cceb9
Merge branch 'master' into hive_table 2021-11-25 17:33:26 +08:00
Kseniia Sumarokova
93cf66df12
Merge pull request #30936 from kssenii/seekable-read-buffers
Reduce memory usage for some formats when reading with s3/url/hdfs
2021-11-25 11:19:24 +03:00
lgbo
996d7125c0
Merge branch 'master' into hive_table 2021-11-23 10:19:02 +08:00
Anton Popov
ccd78e3838 Merge remote-tracking branch 'upstream/master' into HEAD 2021-11-22 17:19:35 +03:00
Kruglov Pavel
cded91b013
Update verbosePrintString.h 2021-11-19 16:51:49 +03:00
taiyang-li
e8644807fe merge master and solve conflict 2021-11-19 15:01:58 +08:00
MaxWk
f17d5b02e4 use bool representation 2021-11-19 14:30:22 +08:00
kssenii
1a9817f872 Correct merge 2021-11-18 07:56:10 +00:00
avogar
1ebcbf4748 Fix style 2021-11-16 17:10:30 +03:00
avogar
8e9783388b Add formats CustomSeparatedWithNames/WithNamesAndTypes 2021-11-16 17:10:30 +03:00
avogar
73d1918410 tmp 2021-11-16 17:10:30 +03:00
kssenii
37f482d478 Merge branch 'master' of github.com:ClickHouse/ClickHouse into versioning 2021-11-15 07:31:11 +00:00
kssenii
f18dcd2287 Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers 2021-11-13 14:38:57 +03:00
cgp
18504f545a move InputCreatorFunc to InputCreator 2021-11-12 00:34:59 +08:00
MaxWk
d42a454837 support some bool format 2021-11-11 16:01:32 +08:00
taiyang-li
deef4d4dbe add options read_bool_as_uint8 when parse csv 2021-11-11 11:49:54 +08:00
Anton Popov
a20922b2d3 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-11-09 15:36:25 +03:00
Anton Popov
66973a2a28 Merge remote-tracking branch 'upstream/master' into HEAD 2021-11-08 21:27:45 +03:00
taiyang-li
36ca0b296b implement hive table engine 2021-11-05 19:55:30 +08:00
Anton Popov
84e914e05a minor fixes near serializations 2021-11-05 01:46:00 +03:00
kssenii
ec11179f91 Merge branch 'master' of github.com:ClickHouse/ClickHouse into seekable-read-buffers 2021-11-03 14:33:31 +03:00
kssenii
45ea820297 Reduce memory usage for some formats 2021-11-03 14:30:03 +03:00
Kruglov Pavel
327a34e9da
Merge pull request #30497 from Avogar/null-deserialization
Add custom null representation support for TSV/CSV input formats, fix Nullable(String) deserializing in some formats
2021-11-03 11:30:25 +03:00
avogar
42ab57f0e5 Set output_format_avro_rows_in_file default to 1 2021-11-02 14:06:10 +03:00
Kruglov Pavel
901ebcede6
Merge pull request #30351 from arenadata/ADQM-335
output_format_avro_rows_in_file
2021-11-02 12:25:27 +03:00
Kruglov Pavel
1f8535c02b
Merge branch 'master' into null-deserialization 2021-11-02 12:15:21 +03:00
Anton Popov
1628f50e51
Merge branch 'master' into sparse-serialization 2021-11-02 06:26:18 +03:00
Kruglov Pavel
9a1275cb10
Merge pull request #30178 from Avogar/tsv-csv
Refactor and improve TSV, CSV, JSONCompactEachRow, RowBinary formats. Fix bugs in formats
2021-11-02 00:38:30 +03:00
Anton Popov
d50137013c Merge remote-tracking branch 'upstream/master' into HEAD 2021-11-01 16:55:53 +03:00
Vitaly Baranov
d29b73e301
Merge pull request #30689 from vitlibar/refactor-log-family
Refactoring of Log family
2021-10-31 18:50:08 +03:00
Vitaly Baranov
0e8c9b089f Keep indices for StorageStripeLog in memory. 2021-10-31 03:52:41 +03:00
Anton Popov
0099dfd523 refactoring of SerializationInfo 2021-10-29 20:21:02 +03:00
Kruglov Pavel
7d4f211d5b
Merge branch 'master' into tsv-csv 2021-10-29 16:38:06 +03:00
Alexey Milovidov
8b4a6a2416 Remove cruft 2021-10-28 02:10:39 +03:00
avogar
d1ef96a5ef Add test, avoid unnecessary allocations, use PeekableReadBuffer only in corner case 2021-10-27 17:29:15 +03:00
avogar
d5c5a3213b Add custom null representation support for TSV/CSV input formats, fix bugs in deserializing NULLs in some cases 2021-10-21 16:52:27 +03:00
Ilya Golshtein
551a1065c1 output_format_avro_rows_in_file default is 1000000 2021-10-21 14:19:25 +03:00
Ilya Golshtein
82f33151e7 output_format_avro_rows_in_file fixes per code review 2021-10-21 02:53:39 +03:00
avogar
872cca550a Make better 2021-10-20 15:47:20 +03:00
mergify[bot]
0a4360c43e
Merge branch 'master' into tsv-csv 2021-10-20 11:57:06 +00:00
avogar
7007286088 Fix WithNamesAndTypes parallel parsing, add new tests, small refactoring 2021-10-20 14:48:54 +03:00
Nikolai Kochetov
a92dc0a826 Update obsolete comments. 2021-10-19 12:58:10 +03:00
Kruglov Pavel
5052ec3ab0
Merge branch 'master' into tsv-csv 2021-10-19 12:03:52 +03:00
Kruglov Pavel
1e2ceeb2e7
Merge pull request #29291 from Avogar/capnproto
Add CapnProto output format, refactor CapnProto input format
2021-10-19 11:54:55 +03:00
Ilya Golshtein
d90302aa3b output_format_avro_rows_in_file 2021-10-18 19:01:06 +03:00
Anton Popov
d71ffc355a Merge remote-tracking branch 'upstream/master' into HEAD 2021-10-18 15:18:22 +03:00
Kruglov Pavel
dbc2f3408e
Merge branch 'master' into tsv-csv 2021-10-18 14:38:22 +03:00
Kruglov Pavel
6350957709
Fix special build 2021-10-18 14:30:02 +03:00
Nikolai Kochetov
bfcbf5abe0 Merge branch 'master' into removing-data-streams-folder 2021-10-17 10:42:37 +03:00
Nikolai Kochetov
a08c98d760 Move some files. 2021-10-16 17:03:50 +03:00
Azat Khuzhin
50231460af Use forward declaration for Buffer<> in generic headers
- changes in ReadHelpers.h -- recompiles 1000 modules
- changes in FormatFactor.h -- recompiles 100 modules
2021-10-16 12:03:24 +03:00
Nikolai Kochetov
fd14faeae2 Remove DataStreams folder. 2021-10-15 23:18:20 +03:00
avogar
2da8180613 Add space after comma 2021-10-14 21:39:09 +03:00
avogar
8729201208 Remove redundant move 2021-10-14 21:36:57 +03:00
avogar
89c1a04ef4 Fix comments 2021-10-14 21:35:56 +03:00
Kruglov Pavel
9ec6930c15 Better exception handling 2021-10-14 16:43:23 +03:00
Kruglov Pavel
95790b8a1c Update CapnProtoUtils.cpp 2021-10-14 16:43:23 +03:00
Kruglov Pavel
9ddcdbba39 Add INCORRECT_DATA error code 2021-10-14 16:43:23 +03:00
avogar
f88a2ad653 Handle exception when cannot extract value from struct, add test for it 2021-10-14 16:43:23 +03:00
avogar
ed8818a773 Fix style, better check in enum comparison 2021-10-14 16:43:22 +03:00
Kruglov Pavel
1cd938fbba Fix typo 2021-10-14 16:43:22 +03:00
avogar
ce22f534c4 Add CapnProto output format, refactor CapnProto input format 2021-10-14 16:43:22 +03:00
avogar
324dfd4f81 Refactor and improve TSV, CSV and JSONCompactEachRow formats, fix some bugs in formats 2021-10-14 13:32:49 +03:00
Nikolai Kochetov
ab28c6c855 Remove BlockInputStream interfaces. 2021-10-14 13:25:43 +03:00
Nikolai Kochetov
2957971ee3 Remove some last streams. 2021-10-13 21:22:02 +03:00
Nikolai Kochetov
a5fa5c7ea3 Move formats to Impl 2021-10-13 13:01:08 +03:00
Nikolai Kochetov
88b1807434 Fix special build. 2021-10-12 10:33:45 +03:00
Nikolai Kochetov
1e1d5d7fea Fix style. 2021-10-11 22:21:04 +03:00
Nikolai Kochetov
ec18340351 Remove streams from formats. 2021-10-11 19:11:50 +03:00
Nikolai Kochetov
1f6d5482b1 Fix some tests. 2021-10-08 21:33:51 +03:00
Nikolai Kochetov
c6bce1a4cf Update Native. 2021-10-08 20:21:19 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Alexey Milovidov
cd7f9d981c Remove ya.make 2021-09-25 04:22:54 +03:00
Anton Popov
ea4fd19e28
Merge pull request #29087 from CurtizJ/asyn-inserts-follow-up
Minor enhancements in async inserts
2021-09-21 13:38:52 +03:00
PHO
3c4b1ea9c5 New setting: output_format_csv_null_representation
This is the same as output_format_tsv_null_representation but is for CSV output.
2021-09-17 17:58:23 +09:00
Anton Popov
99175f7acc minor enhancements in async inserts 2021-09-16 20:55:34 +03:00
Vitaly Stoyan
9bbdd39efc initial commit 2021-09-15 18:07:18 +03:00
Anton Popov
ee7c0d4cc1 dynamic columns: fix several cases of parsing json 2021-09-10 00:18:02 +03:00
Anton Popov
4c388e3d84 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-09-09 14:10:16 +03:00
Nikita Mikhaylov
fb66ab75be
Merge pull request #25633 from Avogar/json-as-string
Allow data in square brackets in JSONAsString format
2021-08-30 14:06:28 +03:00
Dmitrii Kovalkov
9871ad70ff Exclude fuzzers 2021-08-30 11:12:25 +03:00
mergify[bot]
401b2f3b8f
Merge branch 'master' into json-as-string 2021-08-26 15:03:59 +00:00
Nikolai Kochetov
5842d3573d Fix throw without exception in MySQL source. 2021-08-23 15:49:41 +03:00
Anton Popov
61239343e3 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-08-20 16:33:30 +03:00
Alexey Milovidov
8adaef7c8e Make text format for Decimal tuneable 2021-08-16 11:03:23 +03:00
Nikolai Kochetov
ad00aaa18c
Merge pull request #27575 from kitaisreal/removed-some-data-streams
Removed some data streams
2021-08-13 12:59:00 +03:00
alexey-milovidov
36ab47769b
Merge pull request #27609 from Algunenano/refactor_mysql_format
Refactor mysql format check
2021-08-13 03:02:49 +03:00
mergify[bot]
38d97ec52a
Merge branch 'master' into json-as-string 2021-08-12 17:18:38 +00:00
Nikita Mikhaylov
8c06abee73
Merge pull request #25902 from Avogar/arrow-nested
Refactor ArrowColumnToCHColumn, support inserting Nested as Array(Struct) in Arrow/ORC/Parquet
2021-08-12 20:02:01 +03:00
Raúl Marín
a451bf6eac Remove unused code 2021-08-12 11:30:01 +02:00
Raúl Marín
f6788fc660 Mysql handler: Move format check to the handler 2021-08-12 11:29:50 +02:00
Maksim Kita
124a87684f Removed some data streams 2021-08-11 23:39:01 +03:00
mergify[bot]
0de123e0e2
Merge branch 'master' into fix_01176 2021-08-11 03:07:40 +00:00
Raúl Marín
65bb4ff744 Unify mysql output format checks 2021-08-09 14:29:35 +02:00
Raúl Marín
b1ff4ca81a Fix 01176_mysql_client_interactive and work with mariadb client 2021-08-06 18:03:27 +02:00
Nikolai Kochetov
13f95f3fdf Streams -> Processors for dicts, part 3. 2021-08-06 11:41:45 +03:00
Nikolai Kochetov
8546df13c2 Streams -> Processors for dicts, part 2. 2021-08-05 21:08:52 +03:00
mergify[bot]
3201d90105
Merge branch 'master' into json-as-string 2021-08-05 14:18:35 +00:00
Pavel Kruglov
e4c5d7e3b1 Support inserting nested as Array of structs, add some refactoring 2021-08-05 14:10:27 +03:00
Anton Popov
e36736b50c Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-08-02 22:52:02 +03:00
kssenii
58b3a3f3fc Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into versioning 2021-07-29 19:56:27 +00:00
Vitaly Stoyan
c1f71b2e6e
Merge branch 'ClickHouse:master' into arcadia_arrow2 2021-07-28 19:57:25 +03:00
Vitaly Stoyan
4e269ce2e4 initial commit 2021-07-28 14:09:17 +03:00
Alexander Tokmakov
fc9ab2cda7 Merge branch 'master' into rename_materialize_mysql 2021-07-27 22:38:40 +03:00
Alexander Tokmakov
63ab38ee09 rename MaterializeMySQL to MaterializedMySQL 2021-07-26 21:17:28 +03:00
Alexey Milovidov
3f4cfb67bd Fix conversion between DateTime and String in Protobuf 2021-07-24 18:29:19 +03:00
Anton Popov
f99374cca6 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-07-20 18:20:21 +03:00
Nikita Mikhaylov
4d3f828beb
Merge pull request #26314 from kssenii/fix-hdfs-crash
Fix hdfs crash
2021-07-20 15:01:56 +03:00
Vitaly Baranov
4f1926550b
Merge pull request #26429 from vitlibar/remove-mysql-wire-context
Remove MySQLWireContext
2021-07-19 12:21:24 +03:00
Alexey Milovidov
c648e8356b Remove even more code 2021-07-17 21:58:51 +03:00
Alexey Milovidov
5fab6e22cc Remove more code 2021-07-17 21:58:25 +03:00
Kseniia Sumarokova
e844cec16f
Merge branch 'master' into fix-hdfs-crash 2021-07-16 22:21:30 +03:00
Vitaly Baranov
0f8b196682 Remove MySQLWireContext. 2021-07-16 22:21:20 +03:00
kssenii
1993d8e0f4 Fix hdfs crash 2021-07-14 15:13:18 +03:00
Ilya Golshtein
16532658c2 Avro string for ClickHouse string 2021-07-13 20:03:00 +03:00
Alexander Tokmakov
1a470fb777 fix sequence_id in MySQL protocol 2021-07-07 20:03:28 +03:00
Anton Popov
072e65b728 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-07-07 17:20:38 +03:00
Kseniia Sumarokova
9d02af7d8b
Merge pull request #25676 from sand6255/MaterializeMySQL-Support-Enum-Data-Type
MaterializeMySQL: support ENUM data type
2021-06-28 19:15:00 +03:00
Kostiantyn Storozhuk
ca4783d854 Fixed typo and casting 2021-06-28 15:22:13 +08:00
Storozhuk Kostiantyn
4a3145f586 Materialize my sql support enum data type
* Implemented Enum for MaterializeMySQL
2021-06-25 16:42:25 +08:00
Pavel Kruglov
92e6df7b89 Allow data in square brackets in JSONAsString format 2021-06-23 16:17:34 +03:00
Kseniia Sumarokova
41c65a9965
Merge pull request #25528 from kssenii/something
Fix convertion of datetime with timezone for mysql, odbc, etc..
2021-06-21 22:55:22 +03:00
kssenii
c0732ddc12 Fix datetime with timezone 2021-06-21 08:22:12 +00:00
mergify[bot]
c723dd7d40
Merge branch 'master' into arrow 2021-06-18 12:18:57 +00:00
Maksim Kita
c2ecdb7269 Fixed build issues 2021-06-16 23:28:41 +03:00
Maksim Kita
67e9b85951 Merge ext into common 2021-06-16 23:28:41 +03:00
Kruglov Pavel
bf36f5a977
Merge pull request #25000 from vitlibar/fix-protobuf-serialization-of-splitted-nested-messages
Fix serialization of splitted nested messages in Protobuf format.
2021-06-16 14:04:14 +03:00
Pavel Kruglov
c8b37977da Fix bugs, support dictionary for Arrow format 2021-06-15 16:15:27 +03:00
Anton Popov
6a5daca135 dynamic subcolumns: new format and several fixes 2021-06-08 12:33:04 +03:00
Vitaly Baranov
c015ec7be9 Fix serialization of splitted nested messages in Protobuf format. 2021-06-05 14:20:39 +03:00
Nikolai Kochetov
dbaa6ffc62 Rename ContextConstPtr to ContextPtr. 2021-06-01 15:20:52 +03:00
kssenii
32095a2b74 Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into versioning 2021-06-01 08:01:06 +00:00
kssenii
054fe1cf2f Fix 2021-05-31 14:24:35 +03:00
kssenii
d18609467b First version 2021-05-30 13:57:30 +00:00
kssenii
69816e6eff Fix checks 2021-05-30 15:44:58 +03:00
kssenii
2a631aaf08 Final fixes 2021-05-29 00:34:44 +03:00
kssenii
0d393c0006 Fix tests 2021-05-27 17:21:19 +03:00
kssenii
f66c67a979 Fixes 2021-05-27 15:42:46 +03:00
kssenii
866b29fb5a Return list fds with Poco, more canonical 2021-05-23 10:56:13 +03:00
kssenii
85cc7a8923 Remove last presence of Poco::Path in src 2021-05-17 17:44:10 +03:00
kssenii
649dd23b8b Poco::resolve 2021-05-17 09:48:02 +03:00
kssenii
3b1bf2bae6 Poco::Path substitution 2021-05-16 23:38:23 +03:00
Alexey Milovidov
7dfb7664f7 Messing with the code (removed trash) 2021-05-07 21:16:27 +03:00
Alexey Milovidov
962a7113f6 Remove code that I do not like (crazy templates) 2021-05-04 20:26:09 +03:00
Alexey Milovidov
052b532025 Untangle UUID 2021-05-04 02:16:45 +03:00
Alexey Milovidov
6ca37b9512 Untangle UUID 2021-05-04 01:59:38 +03:00
Maksim Kita
318c4bb80d Add examples folder filter to ya.make.in 2021-04-30 11:25:52 +03:00
Nikita Mikhaylov
9f55424250 move to examples everywhere 2021-04-27 01:51:42 +03:00
Alexander Kuzmenkov
35459a0228
Update FormatFactory.cpp 2021-04-22 21:48:06 +03:00
TCeason
b4bf53dfc9 add some comment and modify a parameter type 2021-04-19 15:34:27 +08:00
TCeason
63403c709c modify settings name to external_xxx and rewrite Storage MySQL max_block_size 2021-04-19 10:51:50 +08:00
TCeason
87aa904440 Modify according to review opinion 2021-04-19 10:51:50 +08:00
TCeason
472c131420 Add MySQL read history data bytes judgment 2021-04-19 10:51:50 +08:00
Alexey Milovidov
07b610cf70 Remove useless files 2021-04-13 23:04:13 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Alexander Kuzmenkov
e44b3822e3
Merge pull request #21850 from fastio/handle_errors_for_kafka_engine
Handle errors for Kafka engine
2021-04-09 22:59:40 +03:00
alexey-milovidov
5d672d4529 Update FormatFactory.cpp 2021-04-06 22:23:16 +03:00
Nikita Mikhailov
37f48d13b4 add test 2021-04-06 22:23:16 +03:00
kssenii
dc42d5189d Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into replicas-shards-for-mysql-and-postgres 2021-04-05 15:36:35 +00:00
alexey-milovidov
687d1e9b54
Merge pull request #22528 from kitaisreal/format-settings-null-as-default-default-value-fix
FormatSettings null_as_default default value fix
2021-04-03 12:49:08 +03:00
Maksim Kita
5ba6c7b731 FormatSettings null_as_default default value fix 2021-04-03 00:05:40 +03:00
kssenii
18dd8dd79b Typo 2021-03-31 17:05:41 +00:00
Peng Jian
26b5482b4d remove the flag in the parser 2021-03-31 22:25:51 +08:00
Peng Jian
909d5ad2b5 Handle errors for Kafka engine 2021-03-31 17:15:57 +08:00
kssenii
ce05087b1b Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into replicas-shards-for-mysql-and-postgres 2021-03-30 17:27:10 +00:00
kssenii
ef537b802f Better comments 2021-03-27 21:10:44 +00:00
kssenii
95e8a8b9f0 Support shards 2021-03-27 14:40:07 +00:00
kssenii
ae868208c2 Use pool with failover in mysql storage 2021-03-27 14:39:45 +00:00
Anton Popov
6a15431be7 Merge remote-tracking branch 'upstream/master' into HEAD 2021-03-25 15:57:35 +03:00
Alexey Milovidov
3f67f4f47b Saturation for DateTime 2021-03-15 23:40:33 +03:00
Alexey Milovidov
671395e8c8 Most likely improve performance 2021-03-15 22:23:27 +03:00
Anton Popov
bc417cf54a refactoring of serializations 2021-03-09 17:46:52 +03:00
Vitaly Baranov
2480e4ee3d Better tests for protobuf format #2. 2021-02-24 21:06:29 +03:00
Vitaly Baranov
2eecaee08d Better tests for protobuf format. 2021-02-20 23:13:32 +03:00
Vitaly Baranov
3cbb325913
Merge pull request #20506 from vitlibar/refactor-protobuf-format-io
Improved serialization in Protobuf format.
2021-02-18 11:31:37 +03:00
Vitaly Baranov
18e036d19b Improved serialization for data types combined of Arrays and Tuples.
Improved matching enum data types to protobuf enum type.
Fixed serialization of the Map data type.
Omitted values are now set by default.
2021-02-17 20:50:09 +03:00
Nikita Mikhailov
d615b8e516 more checks
(cherry picked from commit b45168ecaf37d0061edfd12c67a8c5300d45d2e3)
2021-02-15 16:11:16 +03:00
Nikita Mikhaylov
a77b740a7c
Merge pull request #20286 from nikitamikhaylov/json-import-bugfix
Error from allocator on JSON import
2021-02-15 12:40:09 +03:00
Nikita Mikhaylov
3174c57562
Update src/Formats/JSONEachRowUtils.cpp
Co-authored-by: tavplubix <tavplubix@gmail.com>
2021-02-12 15:29:19 +03:00
Nikita Mikhailov
47f62e899b style 2021-02-10 17:52:28 +03:00
Nikita Mikhailov
6c9f5e4991 try 2021-02-10 17:16:27 +03:00
Alexey Milovidov
905793a7e4 Disable excessive squashing of blocks for StorageMemory #13052 2021-02-07 04:57:17 +03:00
Alexey Milovidov
bd0ec1b9f4 Remove useless header 2021-02-02 06:03:30 +03:00
Maksim Kita
a45459e095 Fixed tests 2021-01-27 16:25:27 +03:00
Maksim Kita
b745c64459 Added Nullable support for DirectDictionary 2021-01-27 16:25:27 +03:00