Commit Graph

272 Commits

Author SHA1 Message Date
Kruglov Pavel
71b6d6c6ae
Merge pull request #47114 from Avogar/parquet-compression
Improve working with compression methods in Parquet/ORC/Arrow formats
2023-03-09 13:02:18 +01:00
Kruglov Pavel
3de905bb7c
Merge pull request #46616 from Avogar/fix-ipv4-ipv6-formats
Fix IPv4/IPv6 serialization/deserialization in binary formats
2023-03-06 19:40:29 +01:00
avogar
a6cf2cdab8 Fix style, add docs 2023-03-02 10:36:07 +00:00
Kruglov Pavel
36e65f5f84
Use versions vith dots 2023-02-27 19:00:40 +01:00
Kruglov Pavel
443dedddca
Merge branch 'master' into use-parquet-2 2023-02-27 14:31:43 +01:00
Kruglov Pavel
47f9ca2166
Merge branch 'master' into fix-ipv4-ipv6-formats 2023-02-23 20:32:43 +01:00
avogar
54622566df Add setting to change parquet version 2023-02-23 16:14:10 +00:00
Dan Roscigno
b6612d2c18
fix anchor link 2023-02-21 11:24:39 -05:00
avogar
e37f6b5457 Update docs 2023-02-20 19:50:25 +00:00
Kruglov Pavel
2a3cb8b4ee
Merge pull request #45340 from Avogar/parquet-fixed-binary
Support FixedSizeBinary type in Parquet/Arrow
2023-02-10 18:31:20 +01:00
Dan Roscigno
b33486d715
Update formats.md
closes
2023-02-09 12:52:01 -05:00
Kruglov Pavel
4e2918cee3
Merge branch 'master' into parquet-fixed-binary 2023-02-08 12:31:13 +01:00
Azat Khuzhin
1a8437f2c9 Add ability to ignore unknown keys in JSON object for named tuples
This can be useful in case your input JSON is complex, while you need
only few fields in it.

This behaviour is controlled by the
input_format_json_ignore_unknown_keys_in_named_tuple setting name, that
is turned OFF by default.

This will, almost, allow to parse gharchive dataset without jq. "almost"
because of two things:
- Tuple cannot be Nullable, so such keys with Tuple type in ClickHouse
  cannot be `null` in JSON
- You cannot use dot.dot notation to extract columns for file() engine,
  only tupleElement()

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-27 10:01:08 +01:00
Dan Roscigno
777ddf80ab
Update docs/en/interfaces/formats.md 2023-01-26 11:18:18 -05:00
Denys Golotiuk
0eadb7112d
Added markdown format docs 2023-01-26 13:33:14 +02:00
Dan Roscigno
1cc3708092
Merge branch 'master' into update-settings-url 2023-01-24 19:51:39 -05:00
DanRoscigno
77ae27f26c update for split of format settings 2023-01-24 19:37:55 -05:00
Kruglov Pavel
23c12ac8ee
Merge branch 'master' into parquet-fixed-binary 2023-01-24 16:51:05 +01:00
Kruglov Pavel
4bd3f0e5ef
Merge pull request #44953 from Avogar/tsv-csv-detect-header
Detect header in CSV/TSV/CustomSeparated files automatically
2023-01-24 15:13:52 +01:00
Rich Raposa
429e93965c
Update formats.md
Google has a new website for Protocol Buffers. The old link expires on Jan 31, 2023
2023-01-23 15:42:35 -07:00
avogar
5bf4704e7a Support FixedSizeBinary type in Parquet/Arrow 2023-01-16 21:01:31 +00:00
Kruglov Pavel
e9d6590926
Merge branch 'master' into tsv-csv-detect-header 2023-01-16 17:50:24 +01:00
avogar
1c0941d72a Add docs and examples 2023-01-16 16:46:41 +00:00
avogar
87b934c472 Insert default values in case of missing tuple elements in JSONEachRow 2023-01-12 16:36:44 +00:00
DanRoscigno
7168c217b0 switch text to response for query blocks 2023-01-11 10:08:11 -05:00
serxa
8d099a4417 make more SQL queries copyable from docs in one click 2023-01-11 13:43:51 +00:00
Ivan Blinkov
61c2f23713 Remove leftover empty lines at the end of markdown files 2023-01-09 15:15:18 +01:00
Ivan Blinkov
b7e082d033 Remove "Original article links" 2023-01-09 15:13:36 +01:00
DanRoscigno
925ce4b96c edits 2022-12-30 09:21:12 -05:00
DanRoscigno
0902db3fe0 edits 2022-12-29 22:34:25 -05:00
avogar
ae715b9d00 Finish docs 2022-12-29 20:42:03 +00:00
avogar
46b7ec4209 Add detailed documentation about schema inference 2022-12-29 13:42:56 +00:00
Yakov Olkhovskiy
9ce4e6b7e2
fix style 2022-12-16 17:30:40 -05:00
Yakov Olkhovskiy
bb5d7ff28b
Append requirement for FORMAT RowBinary with strict delimiter 2022-12-16 15:43:52 -05:00
avogar
d0f9bb2ec2 Allow to parse JSON objects into Strings 2022-12-08 18:58:18 +00:00
Kruglov Pavel
c35b2a6495
Add a limit for string size in RowBinary format (#43842) 2022-12-02 13:57:11 +01:00
Kruglov Pavel
98d6b96c82
Merge pull request #42033 from mark-polokhov/BSONEachRow
Add BSONEachRow input/output format
2022-11-22 14:45:21 +01:00
avogar
2af60f34eb Restrict document size in parallel parsing, allow to read ObjectId/JS code into String column 2022-11-15 13:35:17 +00:00
avogar
4d993e653a Fix build and style 2022-11-15 13:06:24 +00:00
avogar
842d25c358 Minor improvements, better docs 2022-11-14 20:05:01 +00:00
avogar
564d83bbc7 Better handle uint64 2022-11-11 13:24:12 +00:00
avogar
94c6dc42eb Use better types 2022-11-11 13:17:48 +00:00
avogar
9e89af28c6 Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference 2022-11-10 20:15:14 +00:00
DanRoscigno
34f90ff6ef update pages that refer to dictionaries 2022-11-07 09:26:50 -05:00
Kruglov Pavel
9c1e654584
Fix typo 2022-09-28 16:38:04 +02:00
avogar
03ee7efcb9 Better example in docs 2022-09-28 12:48:31 +00:00
Kruglov Pavel
bfddb91c9a
Update docs/en/interfaces/formats.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2022-09-28 14:15:53 +02:00
avogar
4f32ef9bb7 Add docs 2022-09-22 17:04:42 +00:00
Kruglov Pavel
22e11aef2d
Merge pull request #40910 from Avogar/new-json-formats
Add new JSON formats, add improvements and refactoring
2022-09-21 14:19:08 +02:00
avogar
868ce8bc16 Fix comments, make better naming, add docs, add setting output_format_json_quote_64bit_floats 2022-09-20 13:49:17 +00:00