Commit Graph

1033 Commits

Author SHA1 Message Date
zhenjial
0f788d98f5 new implementation 2022-09-06 20:39:54 +08:00
zhenjial
18db90dcfc Record errors while reading text formats (CSV, TSV). 2022-09-06 17:19:15 +08:00
avogar
afc34dca41 Add new JSON formats, add improvements and refactoring 2022-09-01 19:00:24 +00:00
Kruglov Pavel
f53aa86a20
Merge pull request #40485 from arthurpassos/fix-parquet-chunked-array-deserialization
Add support for extended (chunked) arrays for Parquet format
2022-09-01 19:40:40 +02:00
Alexey Milovidov
6b2e227c8b Fix integration test 2022-08-27 22:28:38 +02:00
Kruglov Pavel
e6e7f5db93
Merge pull request #40491 from mini4/fix-settings-input_format_tsv_skip_first_lines
Fix bug in settings input_format_tsv_skip_first_lines of format TSV
2022-08-24 15:57:45 +02:00
Kruglov Pavel
0781e8b4f7
Merge pull request #40534 from Avogar/nested-in-avro
Support reading Array(Record) into flatten nested table in Avro
2022-08-24 13:33:12 +02:00
kgurjev
f62c2c3221 Fix bug in settings input_format_tsv_skip_first_lines of format TSV 2022-08-24 10:02:57 +03:00
avogar
29a887578b Fix 2022-08-23 11:42:57 +00:00
avogar
581e569d04 Support reading Array(Record) into flatten nested table in Avro 2022-08-23 11:05:02 +00:00
Arthur Passos
f8e2ab0a20 Use FileReader::GetRecordBatchReader instead of FileReader::ReadRowGroup to parse Parquet 2022-08-22 08:21:32 -03:00
avogar
612ffaffde Make schema inference cache better, respect format settings that can change the schema 2022-08-19 16:39:13 +00:00
Kruglov Pavel
b67cb9e378
Merge pull request #40173 from Avogar/arrow-dict
Improve and fix dictionaries in Arrow format
2022-08-18 20:54:55 +02:00
Kruglov Pavel
09a2ff8843
Merge pull request #40293 from joshuataylor/feature/arrow-large-binary-string
Add support for LARGE_BINARY/LARGE_STRING with Arrow
2022-08-18 14:01:58 +02:00
avogar
a6318cecd5 Fix hive test 2022-08-18 11:32:42 +00:00
Nikolai Kochetov
5a85531ef7
Merge pull request #38286 from Avogar/schema-inference-cache
Add schema inference cache for s3/hdfs/file/url
2022-08-18 13:07:50 +02:00
Yakov Olkhovskiy
40fd6e189a
call readColumnWithStringData 2022-08-17 09:54:01 -04:00
Kruglov Pavel
19af748737
Fix typo 2022-08-17 14:29:09 +02:00
Kruglov Pavel
00d04456ff
Try reduce code duplication 2022-08-17 14:28:15 +02:00
avogar
8dd54c043d Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-cache 2022-08-17 11:47:40 +00:00
Josh Taylor
628d2bbff5 Add support for LARGE_BINARY/LARGE_STRING with Arrow 2022-08-17 10:25:06 +08:00
avogar
99d8727335 Fix tests 2022-08-16 12:56:51 +00:00
avogar
e1ff996ec3 Allow to specify structure hints in schema inference 2022-08-16 09:46:57 +00:00
Kruglov Pavel
2c5c0d6d47
Fix typo 2022-08-15 19:55:28 +02:00
avogar
ca0d883c0f Fix possible segfault in CapnProto input format 2022-08-15 15:36:18 +00:00
avogar
c160033837 Fix 2022-08-15 11:38:28 +00:00
avogar
78e197063c Better example 2022-08-12 19:08:36 +00:00
avogar
763f84b623 Remove bad comment 2022-08-12 19:05:57 +00:00
avogar
9addded80e Remove logging 2022-08-12 19:01:02 +00:00
avogar
000336622a Remove logging 2022-08-12 18:59:52 +00:00
avogar
398576e9c9 Improve and fix dictionaries in Arrow format 2022-08-12 18:56:21 +00:00
Kseniia Sumarokova
a6cfc7bc3b
Merge pull request #34651 from alexX512/master
New caching strategies
2022-08-12 17:23:37 +02:00
Anton Popov
3fdf428834
Merge pull request #39186 from Avogar/numbers-schema-inference
Add new features in schema inference
2022-08-11 00:53:54 +02:00
Arthur Passos
c4d8ad2222 Add docs 2022-08-09 15:58:46 -03:00
Arthur Passos
e724e7bef6 Update arrow dict to lc comment 2022-08-09 15:52:37 -03:00
Arthur Passos
6eb89fd780 Fix both arrow dict de-serialization and dict of nullable de-serialization 2022-08-09 15:06:22 -03:00
Arthur Passos
be1e32c3f1
Merge branch 'ClickHouse:master' into fix_arrow_column_dictionary_to_ch_lc 2022-08-09 15:04:06 -03:00
Kruglov Pavel
088e8cf9bd
Merge branch 'master' into numbers-schema-inference 2022-08-09 14:00:36 +02:00
Kruglov Pavel
99b9e85a8f
Merge pull request #39646 from Avogar/more-formats
Add more Pretty formats
2022-08-09 13:59:47 +02:00
avogar
2f95726b06 Fix comments 2022-08-08 12:41:00 +00:00
alexX512
6bf29cb610 Change class LRUCache to class CachBase. Check running CacheBase with default pcahce policy SLRU 2022-08-07 19:59:30 +00:00
avogar
9b1a267203 Refactor, remove TTL, add size limit, add system table and system query 2022-08-05 16:20:15 +00:00
Arthur Passos
62d48053c0 Use insertDefault instead of insert(0) 2022-08-04 15:53:44 -03:00
Arthur Passos
c307e9a228 Fix ArrowColumn dictionary to CH low cardinality conversion 2022-08-04 15:34:44 -03:00
Kruglov Pavel
6b2186bfeb
Merge branch 'master' into numbers-schema-inference 2022-08-02 19:34:53 +02:00
Kruglov Pavel
42136b7630
Merge pull request #39647 from Avogar/fix-arrow-strings
Fix strings in dictionary in Arrow format
2022-08-01 12:46:07 +02:00
Alexey Milovidov
4828be7fc4 Fix double escaping in the metadata of FORMAT JSON 2022-07-30 23:56:41 +02:00
avogar
01a309d4e3 Fix strings in dictionary in Arrow format 2022-07-27 12:02:27 +00:00
avogar
f925046dc4 Add more Pretty formats 2022-07-27 11:37:02 +00:00
Kruglov Pavel
381ea139c2
Merge branch 'master' into schema-inference-cache 2022-07-27 11:35:36 +02:00
Kruglov Pavel
53159db782
Merge branch 'master' into numbers-schema-inference 2022-07-26 12:32:49 +02:00
Kruglov Pavel
83c7da6e88
Merge branch 'master' into fix-protobuf-capnp-empty-message 2022-07-25 13:02:41 +02:00
Alexey Milovidov
388d06fda1
Merge pull request #39535 from ClickHouse/stringref
Less usage of StringRef
2022-07-25 04:06:11 +03:00
Robert Schulze
4333750985
Less usage of StringRef
... replaced by std::string_view, see #39262
2022-07-24 18:33:52 +00:00
Alexander Tokmakov
bed2206ae9
Merge pull request #39460 from ClickHouse/remove_some_dead_and_commented_code
Remove some dead and commented code
2022-07-22 13:24:34 +03:00
avogar
794aa691bc Merge branch 'master' of github.com:ClickHouse/ClickHouse into fix-protobuf-capnp-empty-message 2022-07-21 17:04:37 +00:00
Kruglov Pavel
9252f42b4c
Merge branch 'master' into schema-inference-cache 2022-07-21 18:59:14 +02:00
avogar
fd534aa3fa wqMerge branch 'master' of github.com:ClickHouse/ClickHouse into numbers-schema-inference 2022-07-21 15:43:17 +00:00
Alexander Tokmakov
a8da5d96fc remove some dead and commented code 2022-07-21 15:05:48 +02:00
Nikolai Kochetov
e15967e9db
Merge pull request #38475 from ClickHouse/additional-filters
Additional filters for a table (from setting)
2022-07-21 07:52:04 +02:00
Alexey Milovidov
dcda9d3bd1
Merge pull request #39365 from Avogar/fix-capnproto-abort
Avoid possible abort() in CapnProto on exception descruction
2022-07-21 05:20:45 +03:00
Nikolai Kochetov
91043351aa Fixing build. 2022-07-20 20:30:16 +00:00
Kruglov Pavel
46da17ca8c
Merge branch 'master' into numbers-schema-inference 2022-07-20 13:32:39 +02:00
Kruglov Pavel
3046cd6d29
Merge branch 'master' into schema-inference-cache 2022-07-20 13:30:42 +02:00
avogar
784ee11594 Add settings to skip fields with unsupported types in Protobuf/CapnProto schema inference 2022-07-20 11:16:25 +00:00
Kruglov Pavel
a1b63b4a02
Fix style 2022-07-20 12:07:22 +02:00
avogar
4f020654be Get rid of unneded ifdefs 2022-07-19 12:12:40 +00:00
avogar
6eb234a1cc Avoid abort() in capnproto on exception descruction 2022-07-18 19:53:24 +00:00
Robert Schulze
32637cb1b9
Fix build 2022-07-18 07:58:59 +00:00
Robert Schulze
13482af4ee
First try at reducing the use of StringRef
- to be replaced by std::string_view
- suggested in #39262
2022-07-17 17:26:02 +00:00
Robert Schulze
deda29b46b
Pass const StringRef by value, not by reference
See #39224
2022-07-15 11:34:56 +00:00
Kruglov Pavel
b38241b08a
Merge branch 'master' into schema-inference-cache 2022-07-14 12:29:54 +02:00
avogar
7cde9d3b40 Add new features in schema inference 2022-07-13 15:57:55 +00:00
vdimir
63aebd17b2 Remove TabSeparatedSorted 2022-07-12 20:22:35 +02:00
vdimir
46df417c2e Fix empty line sorting in TabSeparatedSorted 2022-07-12 20:22:35 +02:00
vdimir
f51b25b262 clickhouse test ignore order via special format 2022-07-12 20:22:35 +02:00
Kruglov Pavel
4080f055b6
Merge pull request #38477 from Avogar/sql-insert-format
Add SQLInsert output format
2022-07-04 15:06:33 +02:00
avogar
5b0fd31c64 Put column names in quotes 2022-06-30 16:14:30 +00:00
Antonio Andelic
de264117fd
Merge pull request #38118 from bigo-sg/storagehive_struct_type
Add struct type support in `StorageHive`
2022-06-30 09:11:13 +02:00
mergify[bot]
9482c99ab8
Merge branch 'master' into sql-insert-format 2022-06-29 11:03:07 +00:00
Robert Schulze
f692ead6ad
Don't use std::unique_lock unless we have to
Replace where possible by std::lock_guard which is more light-weight.
2022-06-28 19:19:06 +00:00
avogar
9bb68bc6de Add SQLInsert output format 2022-06-27 18:31:57 +00:00
avogar
5155262a16 Add some additional information to cache keys 2022-06-27 12:43:24 +00:00
lgbo-ustc
cd8e5c7c49 update headers 2022-06-23 17:43:54 +08:00
lgbo-ustc
96e6f9a2d0 fixed code style 2022-06-23 16:10:01 +08:00
lgbo-ustc
c1770c22b9 Merge remote-tracking branch 'ck/master' into storagehive_struct_type 2022-06-23 15:54:20 +08:00
Kseniia Sumarokova
e48ce50863
Update ArrowBufferedStreams.cpp 2022-06-20 19:12:51 +02:00
kssenii
5dd1bb2fd8 improvements for getFileSize 2022-06-20 15:22:56 +02:00
lgbo-ustc
8c629085e4 simplified code 2022-06-17 09:36:59 +08:00
lgbo-ustc
35d534c213 nested struct in struct 2022-06-16 16:45:05 +08:00
Alexey Milovidov
5e9e5a4eaf
Merge pull request #37525 from Avogar/avro-structs
Support Maps and Records, allow to insert null as default in Avro format
2022-06-15 00:04:29 +03:00
Kseniia Sumarokova
0ae2168fb6
Merge pull request #36328 from bigo-sg/async_hdfs_read_buffer
Apply read_method 'threadpool' for StorageHive
2022-06-10 15:04:21 +02:00
taiyang-li
9fd9ff66bd remove some test code 2022-06-09 09:55:50 +08:00
taiyang-li
c65c56fd48 fix typo 2022-06-07 09:58:29 +08:00
mergify[bot]
ddf7210ecc
Merge branch 'master' into remove-useless-code-2 2022-06-03 13:58:45 +00:00
taiyang-li
f202c35311 Merge branch 'master' into async_hdfs_read_buffer 2022-06-03 17:52:09 +08:00
Paul Loyd
32d267ec6c
Stop removing UTF-8 BOM in RowBinary* formats
Fixes #37420
2022-06-01 13:12:55 +08:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
taiyang-li
047387bf1c fix 2 bugs: 1. select count(1) from hive_table; 2. select _file, _path from hive_table 2022-05-31 17:39:02 +08:00
avogar
4c9812d4c1 Allow to skip some of the first rows in CSV/TSV formats 2022-05-25 15:00:11 +00:00