ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-21 23:21:59 +00:00

Author	SHA1	Message	Date
Kruglov Pavel	f3f8f27db5	Merge pull request #35735 from Avogar/allow-read-bools-as-numbers Allow to infer and parse bools as numbers in JSON input formats	2022-04-07 13:20:49 +02:00
taiyang-li	2ef316801c	Merge branch 'master' into use_minmax_index	2022-04-07 10:53:25 +08:00
Kruglov Pavel	ec2213493f	Merge branch 'master' into allow-read-bools-as-numbers	2022-04-06 14:53:02 +02:00
taiyang-li	acb9f1632e	suppoort skip splits in orc and parquet	2022-04-06 16:40:22 +08:00
Maksim Kita	e6c9a36ac7	Merge pull request #35733 from kitaisreal/ipv6-invalid-insert-test Added test for insert of invalid IPv6 value	2022-04-04 12:28:16 +02:00
mergify[bot]	1e43e26fa1	Merge branch 'master' into fix-order	2022-04-02 12:00:29 +00:00
avogar	ab2a963287	Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers	2022-03-31 14:09:43 +00:00
Maksim Kita	371cdc956a	Added input format settings for parsing invalid IPv4, IPv6 addresses as default values	2022-03-30 12:54:19 +02:00
avogar	3fc36627b3	Allow to infer and parse bools as numbers in JSON input formats	2022-03-29 17:37:31 +00:00
avogar	ce97ccbfb9	Improve schema inference for JSONEachRow and TSKV formats	2022-03-29 14:47:51 +00:00
Antonio Andelic	9990abb76a	Use compile-time check for Exception messages, fix wrong messages	2022-03-29 13:16:11 +00:00
Anton Popov	9610139477	Merge pull request #35629 from CurtizJ/dynamic-columns-5 Support schema inference for type `Object` in format `JSONEachRow`	2022-03-29 14:17:09 +02:00
Anton Popov	d677635cd8	Merge pull request #35592 from CurtizJ/dynamic-columns-4 Add parallel parsing and schema inference for format `JSONAsObject`	2022-03-28 19:29:55 +02:00
Anton Popov	a6450be8b6	fix schema inference	2022-03-26 01:33:10 +00:00
Anton Popov	67195bfdd5	support schema inference for type Object in format JSONEachRow	2022-03-25 21:51:53 +00:00
Anton Popov	78100abc5f	add parallel parsing and schema inference for type Object	2022-03-24 17:51:35 +00:00
Antonio Andelic	0c23cd7b94	Add support for case insensitive column matching in arrow	2022-03-22 10:55:10 +00:00
Antonio Andelic	29d2bf7d1a	Merge branch 'master' into case-insensitive-column-matching	2022-03-21 08:17:27 +00:00
Antonio Andelic	f75b054255	Allow case insensitive column matching	2022-03-21 07:47:37 +00:00
Kruglov Pavel	aa3c05e9d4	Merge pull request #35152 from rschu1ze/protobuf-batch-write ProtobufList	2022-03-18 13:24:34 +01:00
Antonio Andelic	607f785e48	Revert "Merge pull request #35145 from bigo-sg/lower-column-name" This reverts commit `ebf72bf61d`, reversing changes made to `f1b812bdc1`.	2022-03-17 12:31:43 +00:00
Robert Schulze	6e1d7a31bc	Fix build + typo	2022-03-17 11:41:20 +01:00
Anton Popov	2ced42ed41	add experimental settings for Object type	2022-03-16 16:51:23 +00:00
Anton Popov	0ba78c3c3a	Merge remote-tracking branch 'upstream/master' into HEAD	2022-03-16 15:28:09 +00:00
Robert Schulze	0d2ece6d91	Merge branch 'ClickHouse:master' into protobuf-batch-write	2022-03-16 09:43:33 +01:00
avogar	e2d1e643f2	Fix possible segfault in JSONEachRow schema inference	2022-03-15 11:44:15 +00:00
Robert Schulze	23122cb327	Fix review comments ParquetBlockOutputFormat.cpp: - undo unrelated formatting ProtobufSerializer.cpp: - undef debug tracing - simplify logic in writeRow() ProtobufSchemas.cpp: - restore original search in cache by message type	2022-03-15 11:27:17 +01:00
Maksim Kita	538f8cbaad	Fix clang-tidy warnings in Disks, Formats, Functions folders	2022-03-14 18:17:35 +00:00
Anton Popov	36ec379aeb	Merge remote-tracking branch 'upstream/master' into HEAD	2022-03-14 16:28:35 +00:00
Antonio Andelic	ebf72bf61d	Merge pull request #35145 from bigo-sg/lower-column-name add setting to lower column case when reading parquet/orc file	2022-03-14 11:25:03 +01:00
Robert Schulze	514d4d2187	Implement ProtobufList - fixes ClickHouse#16436 Introduce IO format "ProtobufList" with protobuf schema // schemafile.proto message Envelope { message MessageType { uint32 colA = 1; string colB = 2; } repeated MessageType mt = 1; } where "Envelope" is a hard-coded/expected top-level message and "MessageType" is a message with user-provided name containing the table fields to export/import, e.g. SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema = 'schemafile:MessageType' As a result, the new format wraps a list of messages (one per row) into a single, containing message. Compare that to the schema of the existing IO formats "Protobuf" and "ProtobufSingle": message MessageType { uint32 colA = 1; string colB = 2; } The new format does not save space compared to the existing formats, but it is conceptually a bit more beautiful and also more convenenient. Implementation details: - Created new files ProtobufList(Input\|Output)Format which use the existing ProtobufSerializer mechanism. The goal was to reuse as much code as possible and avoid copypasta. - I was torn between inheriting from I(Input\|Output)Format vs. IRow(Input\|Output)Format for ProtobufList(Input\|Output)Format. The former is chunk-based which can be better for performance. Since the ProtobufSerializer mechanism is row-based but data is generally passed around in chunks, I decided for the latter to leverage the existing chunk <--> row mapping code in IRow(InputOutput)Format. - A new ProtobufSerializer called ProtobufSerializerEnvelope was introduced (--> ProtobufSerializer.cpp). It represents the top-level message which encloses the list of inner nested messages, i.e. the rows. - With the new format, parsing the schema file and matching the fields in the schema file to table column works like for the old formats. The only difference is that parsing starts one level below the "Envelope" (--> ProtobufSchema.cpp). This is more natural than forcing customers to have table columns start with "Envelope". - Creation of the ProtobufSerializer tree also works like before. What is different is that we finally add a ProtobufSerializerEnvelope as new root of the tree. It's only purpose is to write/read the top-level message for the first/last row to write/read. Caveats: - The low-level serialization code in ProtobufWriter uses an internal buffer which is flushed to the output file only in endMessage(). In the existing "Protobuf" format, this happens once per row, in the new format this happens only at the end of the serialization since row-level messages now call start/endNestedMessage(). As a future TODO to, the buffer should be flushed also in start/endNestedMessage() to reduce memory consumption.	2022-03-14 08:04:58 +01:00
zhanghuajie	53a8987b3b	fix build fail with gcc --fix warnings without disabling some parameters	2022-03-11 21:59:19 +08:00
shuchaome	46cb4483a6	Optimise by lowering schema on the beginning. Add a functional test.	2022-03-11 14:34:46 +08:00
shuchaome	56795b831d	add setting to lower column case when reading parquet/orc file	2022-03-09 16:07:02 +08:00
Anton Popov	0bc57da238	Merge remote-tracking branch 'upstream/master' into HEAD	2022-03-07 14:46:08 +00:00
Azat Khuzhin	c426eef07d	Fix generating USE_* for system.build_options Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-03-04 15:31:32 +03:00
Anton Popov	df3b07fe7c	Merge remote-tracking branch 'upstream/master' into HEAD	2022-03-03 22:25:28 +00:00
Maksim Kita	f1b1baf56e	Merge pull request #34982 from Cai-Yao/master date_time_input_format = 'best_effort_us'	2022-03-03 09:22:57 +01:00
Maksim Kita	b1a956c5f1	clang-tidy check performance-move-const-arg fix	2022-03-02 18:15:27 +00:00
cwkyaoyao	72194bbaf3	Add date_time_input_format = best_effort_us	2022-03-02 16:00:06 +08:00
avogar	a7c6d11532	Fix schema inference for unquoted dates in CSV	2022-03-01 11:03:26 +00:00
Anton Popov	18940b8637	Merge remote-tracking branch 'upstream/master' into HEAD	2022-02-09 23:38:38 +03:00
Kruglov Pavel	15d85682e8	Fix style	2022-02-07 18:29:22 +03:00
avogar	a4c7ecde87	Make better	2022-02-07 17:51:26 +03:00
avogar	77b42bb9ff	Support UUID in MsgPack format	2022-02-07 17:11:44 +03:00
Anton Popov	836a348a9c	Merge remote-tracking branch 'upstream/master' into HEAD	2022-02-01 15:23:07 +03:00
Maksim Kita	5ef83deaa6	Update sort to pdqsort	2022-01-30 19:49:48 +00:00
Anton Popov	78b9f15abb	Merge remote-tracking branch 'upstream/master' into HEAD	2022-01-30 03:24:37 +03:00
Kruglov Pavel	7873b4475f	Merge branch 'master' into autodetect-format	2022-01-25 10:56:52 +03:00
avogar	a6740d2f9a	Detect format and schema for stdin in clickhouse-local	2022-01-25 10:25:37 +03:00

1 2 3 4 5 ...

455 Commits