Kruglov Pavel
3046cd6d29
Merge branch 'master' into schema-inference-cache
2022-07-20 13:30:42 +02:00
avogar
784ee11594
Add settings to skip fields with unsupported types in Protobuf/CapnProto schema inference
2022-07-20 11:16:25 +00:00
Kruglov Pavel
a1b63b4a02
Fix style
2022-07-20 12:07:22 +02:00
Kruglov Pavel
7722b647b7
Merge pull request #39396 from Avogar/try-fix-write-buffer-terminate
...
Fix WriteBuffer finalize in destructor when cacnel query
2022-07-20 12:06:20 +02:00
avogar
5c16d6b553
Fix WriteBuffer finalize in destructor when cacnel query
2022-07-19 19:21:30 +00:00
avogar
4f020654be
Get rid of unneded ifdefs
2022-07-19 12:12:40 +00:00
avogar
6eb234a1cc
Avoid abort() in capnproto on exception descruction
2022-07-18 19:53:24 +00:00
Robert Schulze
32637cb1b9
Fix build
2022-07-18 07:58:59 +00:00
Robert Schulze
13482af4ee
First try at reducing the use of StringRef
...
- to be replaced by std::string_view
- suggested in #39262
2022-07-17 17:26:02 +00:00
Robert Schulze
deda29b46b
Pass const StringRef by value, not by reference
...
See #39224
2022-07-15 11:34:56 +00:00
Kruglov Pavel
b38241b08a
Merge branch 'master' into schema-inference-cache
2022-07-14 12:29:54 +02:00
avogar
7cde9d3b40
Add new features in schema inference
2022-07-13 15:57:55 +00:00
vdimir
63aebd17b2
Remove TabSeparatedSorted
2022-07-12 20:22:35 +02:00
vdimir
46df417c2e
Fix empty line sorting in TabSeparatedSorted
2022-07-12 20:22:35 +02:00
vdimir
f51b25b262
clickhouse test ignore order via special format
2022-07-12 20:22:35 +02:00
Kruglov Pavel
4080f055b6
Merge pull request #38477 from Avogar/sql-insert-format
...
Add SQLInsert output format
2022-07-04 15:06:33 +02:00
avogar
5b0fd31c64
Put column names in quotes
2022-06-30 16:14:30 +00:00
Antonio Andelic
de264117fd
Merge pull request #38118 from bigo-sg/storagehive_struct_type
...
Add struct type support in `StorageHive`
2022-06-30 09:11:13 +02:00
mergify[bot]
9482c99ab8
Merge branch 'master' into sql-insert-format
2022-06-29 11:03:07 +00:00
Robert Schulze
f692ead6ad
Don't use std::unique_lock unless we have to
...
Replace where possible by std::lock_guard which is more light-weight.
2022-06-28 19:19:06 +00:00
avogar
9bb68bc6de
Add SQLInsert output format
2022-06-27 18:31:57 +00:00
avogar
5155262a16
Add some additional information to cache keys
2022-06-27 12:43:24 +00:00
lgbo-ustc
cd8e5c7c49
update headers
2022-06-23 17:43:54 +08:00
lgbo-ustc
96e6f9a2d0
fixed code style
2022-06-23 16:10:01 +08:00
lgbo-ustc
c1770c22b9
Merge remote-tracking branch 'ck/master' into storagehive_struct_type
2022-06-23 15:54:20 +08:00
Kseniia Sumarokova
e48ce50863
Update ArrowBufferedStreams.cpp
2022-06-20 19:12:51 +02:00
kssenii
5dd1bb2fd8
improvements for getFileSize
2022-06-20 15:22:56 +02:00
lgbo-ustc
8c629085e4
simplified code
2022-06-17 09:36:59 +08:00
lgbo-ustc
35d534c213
nested struct in struct
2022-06-16 16:45:05 +08:00
Alexey Milovidov
5e9e5a4eaf
Merge pull request #37525 from Avogar/avro-structs
...
Support Maps and Records, allow to insert null as default in Avro format
2022-06-15 00:04:29 +03:00
Kseniia Sumarokova
0ae2168fb6
Merge pull request #36328 from bigo-sg/async_hdfs_read_buffer
...
Apply read_method 'threadpool' for StorageHive
2022-06-10 15:04:21 +02:00
taiyang-li
9fd9ff66bd
remove some test code
2022-06-09 09:55:50 +08:00
taiyang-li
c65c56fd48
fix typo
2022-06-07 09:58:29 +08:00
mergify[bot]
ddf7210ecc
Merge branch 'master' into remove-useless-code-2
2022-06-03 13:58:45 +00:00
taiyang-li
f202c35311
Merge branch 'master' into async_hdfs_read_buffer
2022-06-03 17:52:09 +08:00
Paul Loyd
32d267ec6c
Stop removing UTF-8 BOM in RowBinary* formats
...
Fixes #37420
2022-06-01 13:12:55 +08:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
...
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
taiyang-li
047387bf1c
fix 2 bugs: 1. select count(1) from hive_table; 2. select _file, _path from hive_table
2022-05-31 17:39:02 +08:00
avogar
4c9812d4c1
Allow to skip some of the first rows in CSV/TSV formats
2022-05-25 15:00:11 +00:00
avogar
038a422aeb
Add setting to insert null as default
2022-05-25 12:56:59 +00:00
avogar
7817d6aea3
Support Maps and Records in Avro format
2022-05-25 11:20:28 +00:00
Maksim Kita
83554d1f2d
Fixed style
2022-05-25 13:05:39 +02:00
Maksim Kita
9a9df26eec
Fixed tests
2022-05-25 11:44:37 +02:00
Kruglov Pavel
6c9a524f6b
Merge pull request #37192 from Avogar/formats-with-names
...
Improve performance and memory usage for select of subset of columns for some formats
2022-05-24 13:28:14 +02:00
avogar
3651ef93fe
Fix performance test
2022-05-23 17:42:13 +00:00
avogar
034c7122be
Mark JSONColumns supports subset of columns
2022-05-23 15:26:01 +00:00
avogar
ce4adb447f
Fix named tuples output in ORC/Arrow/Parquet formats
2022-05-23 14:21:08 +00:00
Kruglov Pavel
f539fb835d
Merge branch 'master' into formats-with-names
2022-05-23 12:14:20 +02:00
Kruglov Pavel
ce48e8e102
Merge pull request #36975 from Avogar/json-columns-formats
...
Add columnar JSON formats
2022-05-23 12:11:28 +02:00
Kruglov Pavel
9bc74439c1
Merge pull request #37327 from Avogar/arrow-strings
...
Allow to use String type instead of Binary in Arrow/Parquet/ORC formats
2022-05-23 12:05:33 +02:00
mergify[bot]
747aa5575c
Merge branch 'master' into remove-useless-code-2
2022-05-22 17:41:57 +00:00
Kruglov Pavel
704c78063f
Fix special build
2022-05-20 19:54:02 +02:00
Anton Popov
cb0e6c2718
mark all operators bool() as explicit
2022-05-20 15:29:54 +00:00
avogar
566d1b15fd
Merge branch 'master' of github.com:ClickHouse/ClickHouse into formats-with-names
2022-05-20 13:54:52 +00:00
avogar
d2304f5d15
Make better
2022-05-20 12:07:29 +00:00
avogar
a6a430c5ee
Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-columns-formats
2022-05-20 11:08:30 +00:00
mergify[bot]
1ac4199e78
Merge branch 'master' into arrow-strings
2022-05-20 10:43:33 +00:00
avogar
cd6a29897e
Apply input_format_max_rows_to_read_for_schema_inference for all files in globs in total
2022-05-18 17:56:36 +00:00
Kruglov Pavel
d81616ff65
Remove unnecessary include
2022-05-18 17:44:39 +02:00
avogar
a0369fb9a6
Allow to use String type instead of Binary in Arrow/Parquet/ORC formats
2022-05-18 14:51:21 +00:00
avogar
12010a81b7
Make better
2022-05-18 09:25:26 +00:00
Robert Schulze
0c55ac76d2
A few clangtidy updates
...
Enable:
- bugprone-lambda-function-name: "Checks for attempts to get the name of
a function from within a lambda expression. The name of a lambda is
always something like operator(), which is almost never what was
intended."
- bugprone-unhandled-self-assignment: "Finds user-defined copy
assignment operators which do not protect the code against
self-assignment either by checking self-assignment explicitly or using
the copy-and-swap or the copy-and-move method.""
- hicpp-invalid-access-moved: "Warns if an object is used after it has
been moved."
- hicpp-use-noexcept: "This check replaces deprecated dynamic exception
specifications with the appropriate noexcept specification (introduced
in C++11)"
- hicpp-use-override: "Adds override (introduced in C++11) to overridden
virtual functions and removes virtual from those functions as it is
not required."
- performance-type-promotion-in-math-fn: "Finds calls to C math library
functions (from math.h or, in C++, cmath) with implicit float to
double promotions."
Split up:
- cppcoreguidelines-*. Some of them may be useful (haven't checked in
detail), therefore allow to toggle them individually.
Disable:
- linuxkernel-*. Obvious.
2022-05-17 20:56:57 +02:00
Kruglov Pavel
8572879c37
Remove redundant code
2022-05-16 17:58:20 +02:00
Robert Schulze
e3cfec5b09
Merge remote-tracking branch 'origin/master' into clangtidies
2022-05-16 10:12:50 +02:00
avogar
68bb07d166
Better naming
2022-05-13 18:39:19 +00:00
avogar
cef13c2c02
Allow to skip unknown columns in Native format
2022-05-13 14:27:15 +00:00
avogar
b17fec659a
Improve performance and memory usage for select of subset of columns for some formats
2022-05-13 13:51:28 +00:00
mergify[bot]
4a661b6e78
Merge branch 'master' into json-columns-formats
2022-05-13 11:32:03 +00:00
avogar
02679c7222
Fix tests
2022-05-10 16:27:59 +00:00
avogar
ea0362b3a3
Fix tests
2022-05-10 16:20:38 +00:00
avogar
9abdacdd2e
Remove logging
2022-05-09 13:30:41 +00:00
avogar
054318b555
Fix invalid output LowCardinality -> ArrowDictionary
2022-05-09 13:29:42 +00:00
avogar
1e8d7ae749
Fix
2022-05-09 11:29:40 +00:00
avogar
04fdd75c56
Make JSONColumns frormats mono block by default
2022-05-09 11:13:44 +00:00
Robert Schulze
1b81bb49b4
Enable clang-tidy modernize-deprecated-headers & hicpp-deprecated-headers
...
Official docs:
Some headers from C library were deprecated in C++ and are no longer
welcome in C++ codebases. Some have no effect in C++. For more details
refer to the C++ 14 Standard [depr.c.headers] section. This check
replaces C standard library headers with their C++ alternatives and
removes redundant ones.
2022-05-09 08:23:33 +02:00
Robert Schulze
7d3913f350
Enable clang-tidy bugprone-assert-side-effect
...
Official docs:
Finds assert() with side effect. The condition of assert() is
evaluated only in debug builds so a condition with side effect can
cause different behavior in debug / release builds.
2022-05-08 19:15:55 +02:00
avogar
3a13c3e372
Fix comments
2022-05-06 16:50:34 +00:00
avogar
62a7ba3f26
Add columnar JSON formats
2022-05-06 16:48:48 +00:00
Anton Popov
515f68eead
Merge remote-tracking branch 'upstream/master' into dynamic-columns-14
2022-05-06 16:10:51 +00:00
Anton Popov
566c08086a
support Object type inside other types
2022-05-06 14:44:00 +00:00
Anton Popov
13e8db6299
Merge pull request #36762 from CurtizJ/dynamic-columns-12
...
Fix insertion to columns of type `Object` from multiple files
2022-05-06 14:14:32 +02:00
Kruglov Pavel
77e55c344c
Merge pull request #36667 from Avogar/mysqldump-format
...
Add MySQLDump input format
2022-05-04 19:49:48 +02:00
Kruglov Pavel
ffec3655fe
Fix special build
2022-05-04 17:14:15 +02:00
mergify[bot]
64084b5e32
Merge branch 'master' into shared_ptr_helper3
2022-05-03 20:46:16 +00:00
Dmitry Novik
5ba7a55c18
Merge pull request #36650 from bigo-sg/hive_text_parallel_parsing
...
Parallel parsing of hive text format
2022-05-03 15:56:28 +02:00
Kruglov Pavel
d613f7eab0
Merge branch 'master' into mysqldump-format
2022-05-02 13:31:57 +02:00
Antonio Andelic
a1a22b0007
Merge pull request #35149 from ContentSquare/nullables_with_proto3
...
Nullables with proto3 using Google wrappers
2022-05-02 09:49:37 +02:00
Robert Schulze
330212e0f4
Remove inherited create() method + disallow copying
...
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
previously allowed.
Hence, this change
- removes shared_ptr_helper and as a result all inherited create() methods,
- instead, Storage objects are now created using make_shared<>() by the
caller (for that to work, many constructors had to be made public), and
- all Storage classes were marked as noncopyable using boost::noncopyable.
In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
2022-05-02 08:46:52 +02:00
Robert Schulze
89aa9ae00f
Fixed clang-tidy check "bugprone-branch-clone"
...
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"
About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.
About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.
Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
2022-04-30 19:40:28 +02:00
mergify[bot]
cc08ccb420
Merge branch 'master' into remove-useless-code-2
2022-04-30 12:48:15 +00:00
Jakub Kuklis
a1f2dd6d34
Adding two settings in place of one, improvements to the test clarity
2022-04-29 10:01:51 +02:00
Jakub Kuklis
507ba1042c
Adding a setting to enable Google wrappers special treatment
2022-04-29 10:01:51 +02:00
Jakub Kuklis
6d5c1e2fc0
Adding a setting to enable special treatment of google wrappers
2022-04-29 10:01:50 +02:00
Amos Bird
4a5e4274f0
base should not depend on Common
2022-04-29 10:26:35 +08:00
Anton Popov
1fc51e09ff
fix insertion to column of type Object from multiple files via table function
2022-04-28 18:51:13 +00:00
avogar
d295de1689
Fix comments and test
2022-04-28 14:59:35 +00:00
Kruglov Pavel
4d08587559
Merge branch 'master' into mysqldump-format
2022-04-28 15:58:18 +02:00
Kseniia Sumarokova
4c371f710e
Merge pull request #36676 from kssenii/refactor-with-size-buffer
...
Better version of SeekableReadBufferWithSize
2022-04-28 13:44:25 +02:00
taiyang-li
99aa5fdc81
remove useless code
2022-04-27 11:15:04 +08:00
vdimir
81b86799e7
Fixup PrometheusTextOutputFormat
2022-04-26 14:57:37 +00:00
vdimir
d5d98ed951
PrometheusTextOutputFormat: support lables, histograms and summaries
2022-04-26 14:57:36 +00:00
vdimir
be0aa06958
Add output format Prometheus
2022-04-26 14:57:35 +00:00
kssenii
9d364cdce2
Refactor
2022-04-26 15:33:53 +02:00
Kruglov Pavel
a462d94157
Fix error codes
2022-04-26 13:25:07 +02:00
Kruglov Pavel
e3b222b519
Fix typo
2022-04-26 13:24:10 +02:00
avogar
33d845dade
Add MySQLDump input format
2022-04-26 10:42:56 +00:00
taiyang-li
b7cc344d62
remove useless codes
2022-04-26 14:42:43 +08:00
taiyang-li
99dee35b6e
parallel parsing of hive text format
2022-04-26 14:33:10 +08:00
Kruglov Pavel
34c342fdd3
Merge pull request #36205 from Avogar/improve-globs
...
Some refactoring around schema inference with globs
2022-04-25 13:14:46 +02:00
avogar
80eacc8533
Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-json-schema-inference
2022-04-22 17:18:44 +00:00
Kseniia Sumarokova
33bb48106f
Merge pull request #36314 from CurtizJ/print-bad-filenames
...
Show names of erroneous files in case of parsing errors while executing table functions
2022-04-22 13:24:55 +02:00
mergify[bot]
e38a3c3595
Merge branch 'master' into alias
2022-04-21 15:02:30 +00:00
Maksim Kita
57444fc7d3
Merge pull request #36444 from rschu1ze/clang-tidy-fixes
...
Clang tidy fixes
2022-04-21 16:11:27 +02:00
mergify[bot]
1ba1cad5cf
Merge branch 'master' into improve-globs
2022-04-21 11:52:13 +00:00
Kruglov Pavel
a6186f7ba4
Merge pull request #36333 from ClickHouse/bool-sync-after-error
...
Fix tech debt for Bool and Map data types
2022-04-21 13:32:14 +02:00
Kruglov Pavel
813e228fcc
Merge branch 'master' into improve-globs
2022-04-20 16:31:47 +02:00
Anton Popov
d4df38a0e6
fix tests
2022-04-20 14:13:04 +00:00
Alexander Tokmakov
1d30a97fd2
Merge branch 'master' into remove-useless-code-2
2022-04-20 11:45:56 +02:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
...
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Anton Popov
bee4ca9b62
add more tests for error diagnostics in files
2022-04-19 15:56:34 +00:00
Anton Popov
3e361c9759
Merge remote-tracking branch 'upstream/master' into HEAD
2022-04-19 14:18:04 +00:00
Alexey Milovidov
f6ab2bd523
Merge pull request #36312 from ClickHouse/remove-arcadia
...
Remove remaining parts of Arcadia
2022-04-18 07:02:54 +03:00
Alexey Milovidov
242919eddd
Remove abbreviation
2022-04-18 01:02:49 +02:00
mergify[bot]
4fed033dca
Merge branch 'master' into alias
2022-04-17 14:37:04 +00:00
fenglv
2392d4e2b5
fix
2022-04-16 16:08:28 +00:00
Alexey Milovidov
7206838c75
Fix tech debt for Bool and Map data types
2022-04-16 16:09:04 +02:00
fenglv
58111115c5
fix style
2022-04-16 06:21:09 +00:00
fenglv
74ef1b0198
Add aliases JSONLines and NDJSON for JSONEachRow
2022-04-16 06:01:07 +00:00
Anton Popov
2de6668b3f
show names of erroneous files
2022-04-16 00:10:47 +00:00
Alexey Milovidov
cbeeb7ec4f
Remove Arcadia
2022-04-16 00:20:47 +02:00
avogar
42726639f3
Check ORC/Parquet/Arrow format magic bytes before loading file in memory
2022-04-13 19:27:38 +00:00
avogar
f5f1db86d9
Remove commented code
2022-04-13 19:15:52 +00:00
avogar
8b60aeb7bc
Improve schema inference for json objects
2022-04-13 19:13:40 +00:00
avogar
1c065f8c7a
Some refactoring around schema inference with globs
2022-04-13 17:02:48 +00:00
Alexey Milovidov
a54c01cf72
Remove useless code in ReplicatedMergeTreeRestartingThread
2022-04-11 00:44:30 +02:00
avogar
1c783ed88a
Resolve conflicts
2022-04-07 12:17:48 +00:00
avogar
d2017a63b1
Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference
2022-04-07 11:36:40 +00:00
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
...
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
taiyang-li
2ef316801c
Merge branch 'master' into use_minmax_index
2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers
2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference
2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e
suppoort skip splits in orc and parquet
2022-04-06 16:40:22 +08:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order
2022-04-02 12:00:29 +00:00
avogar
ab2a963287
Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers
2022-03-31 14:09:43 +00:00
Kruglov Pavel
252d66e80d
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-31 16:08:37 +02:00
mergify[bot]
24ade25d61
Merge branch 'master' into improve-schema-inference
2022-03-31 13:42:47 +00:00
avogar
836e7dae67
Fix bug in indexes of not presented columns in -WithNames formats
2022-03-31 12:24:40 +00:00
avogar
d272356324
Minor code improvement
2022-03-31 10:55:09 +00:00
avogar
74275da7ee
Make better
2022-03-31 10:52:34 +00:00
avogar
000f3043e7
Make better
2022-03-29 17:40:07 +00:00
avogar
3fc36627b3
Allow to infer and parse bools as numbers in JSON input formats
2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9
Improve schema inference for JSONEachRow and TSKV formats
2022-03-29 14:47:51 +00:00
Antonio Andelic
9990abb76a
Use compile-time check for Exception messages, fix wrong messages
2022-03-29 13:16:11 +00:00
avogar
97f5033ea9
Fix tests
2022-03-29 13:07:37 +00:00
mergify[bot]
343588de2c
Merge branch 'master' into improve-schema-inference
2022-03-29 13:06:00 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
...
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
...
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
Anton Popov
67195bfdd5
support schema inference for type Object in format JSONEachRow
2022-03-25 21:51:53 +00:00
avogar
6fb3c3be04
Fix comments and build
2022-03-25 12:02:21 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference
2022-03-25 12:05:40 +01:00
Vladimir C
ae92963b15
Fix build error in Formats/ISchemaReader.cpp
2022-03-25 11:30:25 +01:00
Kruglov Pavel
287e1a6efc
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:52 +01:00
Kruglov Pavel
6a9df9d471
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:47 +01:00
Kruglov Pavel
3b801a4093
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:41 +01:00
Anton Popov
78100abc5f
add parallel parsing and schema inference for type Object
2022-03-24 17:51:35 +00:00
avogar
557edbd172
Add some improvements and fixes in schema inference
2022-03-24 12:54:12 +00:00
mergify[bot]
bf90edc362
Merge branch 'master' into case-insensitive-column-matching
2022-03-24 08:00:42 +00:00
Kruglov Pavel
826b933b08
Merge pull request #35332 from Avogar/fix-tskv-schema-inference
...
Fix schema inference for TSKV format while using small max_read_buffer_size
2022-03-23 18:37:07 +01:00
Antonio Andelic
052057f2ef
Address PR comments
2022-03-23 15:42:46 +00:00
Antonio Andelic
6b6190554b
Fix conversion of arrow to CH column with hint header
2022-03-22 11:15:48 +00:00
Antonio Andelic
0c23cd7b94
Add support for case insensitive column matching in arrow
2022-03-22 10:55:10 +00:00
Antonio Andelic
ca7844e338
Fix tests
2022-03-22 09:27:20 +00:00
Antonio Andelic
6cebb6bc88
Merge branch 'master' into case-insensitive-column-matching
2022-03-22 07:36:35 +00:00
Antonio Andelic
cb3703b46e
Style fix
2022-03-21 12:54:56 +00:00
Antonio Andelic
0457a3998a
remove old test
2022-03-21 11:58:55 +00:00
Kruglov Pavel
1645b7083f
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:12 +01:00
Kruglov Pavel
0b381ebd26
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:06 +01:00
Kruglov Pavel
f67b8c0bad
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:00 +01:00
Antonio Andelic
0c74fa2c19
Remove unecessary code
2022-03-21 08:38:15 +00:00
tavplubix
716c6f0ffa
Merge pull request #35406 from Avogar/fix-parquet
...
Fix working with unneeded columns in Arrow/Parquet/ORC formats
2022-03-21 11:36:54 +03:00
Antonio Andelic
29d2bf7d1a
Merge branch 'master' into case-insensitive-column-matching
2022-03-21 08:17:27 +00:00
Antonio Andelic
d73c906e68
Format code
2022-03-21 07:50:17 +00:00
Antonio Andelic
f75b054255
Allow case insensitive column matching
2022-03-21 07:47:37 +00:00
avogar
58f2aca120
Fix tests
2022-03-18 19:04:16 +00:00
avogar
cffa2096de
Fix working with unneeded columns in Arrow/Parquet/ORC formats
2022-03-18 13:07:54 +00:00
Kruglov Pavel
aa3c05e9d4
Merge pull request #35152 from rschu1ze/protobuf-batch-write
...
ProtobufList
2022-03-18 13:24:34 +01:00
Antonio Andelic
607f785e48
Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
...
This reverts commit ebf72bf61d
, reversing
changes made to f1b812bdc1
.
2022-03-17 12:31:43 +00:00
Anton Popov
2ced42ed41
add experimental settings for Object type
2022-03-16 16:51:23 +00:00
Anton Popov
0ba78c3c3a
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-16 15:28:09 +00:00
avogar
f7c5fe14e4
Fix schema inference for TSKV format while using small max_read_buffer_size
2022-03-16 13:53:50 +00:00
Robert Schulze
0d2ece6d91
Merge branch 'ClickHouse:master' into protobuf-batch-write
2022-03-16 09:43:33 +01:00
Robert Schulze
23122cb327
Fix review comments
...
ParquetBlockOutputFormat.cpp:
- undo unrelated formatting
ProtobufSerializer.cpp:
- undef debug tracing
- simplify logic in writeRow()
ProtobufSchemas.cpp:
- restore original search in cache by message type
2022-03-15 11:27:17 +01:00
Maksim Kita
2665724301
Fix clang-tidy warnings in Parsers, Processors, QueryPipeline folders
2022-03-14 18:17:35 +00:00
Anton Popov
36ec379aeb
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-14 16:28:35 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
...
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
Robert Schulze
514d4d2187
Implement ProtobufList - fixes ClickHouse#16436
...
Introduce IO format "ProtobufList" with protobuf schema
// schemafile.proto
message Envelope {
message MessageType {
uint32 colA = 1;
string colB = 2;
}
repeated MessageType mt = 1;
}
where "Envelope" is a hard-coded/expected top-level message and
"MessageType" is a message with user-provided name containing the table
fields to export/import, e.g.
SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema =
'schemafile:MessageType'
As a result, the new format wraps a list of messages (one per row) into
a single, containing message. Compare that to the schema of the existing
IO formats "Protobuf" and "ProtobufSingle":
message MessageType {
uint32 colA = 1;
string colB = 2;
}
The new format does not save space compared to the existing formats, but
it is conceptually a bit more beautiful and also more convenenient.
Implementation details:
- Created new files ProtobufList(Input|Output)Format which use the
existing ProtobufSerializer mechanism. The goal was to reuse as much
code as possible and avoid copypasta.
- I was torn between inheriting from I(Input|Output)Format vs.
IRow(Input|Output)Format for ProtobufList(Input|Output)Format. The
former is chunk-based which can be better for performance. Since the
ProtobufSerializer mechanism is row-based but data is generally passed
around in chunks, I decided for the latter to leverage the existing
chunk <--> row mapping code in IRow(InputOutput)Format.
- A new ProtobufSerializer called ProtobufSerializerEnvelope was
introduced (--> ProtobufSerializer.cpp). It represents the top-level
message which encloses the list of inner nested messages, i.e. the
rows.
- With the new format, parsing the schema file and matching the fields in
the schema file to table column works like for the old formats. The only
difference is that parsing starts one level below the "Envelope" (-->
ProtobufSchema.cpp). This is more natural than forcing customers to
have table columns start with "Envelope".
- Creation of the ProtobufSerializer tree also works like before. What
is different is that we finally add a ProtobufSerializerEnvelope as
new root of the tree. It's only purpose is to write/read the top-level
message for the first/last row to write/read.
Caveats:
- The low-level serialization code in ProtobufWriter uses an internal
buffer which is flushed to the output file only in endMessage().
In the existing "Protobuf" format, this happens once per row, in the
new format this happens only at the end of the serialization
since row-level messages now call start/endNestedMessage(). As a
future TODO to, the buffer should be flushed also in
start/endNestedMessage() to reduce memory consumption.
2022-03-14 08:04:58 +01:00
Maksim Kita
ce0c8e5597
Update JSONRowOutputFormat.cpp
2022-03-14 00:58:36 +01:00
Robert Schulze
f0ba39b071
Clean up some header includes and make formatting more consistent
2022-03-13 20:24:12 +01:00
zhanghuajie
53a8987b3b
fix build fail with gcc --fix warnings without disabling some parameters
2022-03-11 21:59:19 +08:00
shuchaome
7a3623d216
fix bug
2022-03-11 17:26:13 +08:00