Add brief docs about apache arrow

2024-11-24 08:32:02 +00:00 · 2020-05-21 18:20:41 +08:00 · 2020-05-21 18:20:41 +08:00 · 95e2f0fa07
commit 95e2f0fa07
parent ab37791134
1 changed files with 16 additions and 4 deletions
--- a/docs/en/interfaces/formats.md
+++ b/docs/en/interfaces/formats.md
@ -37,6 +37,8 @@ The supported formats are:
 | [Avro](#data-format-avro)                                       | ✔     | ✔      |
 | [AvroConfluent](#data-format-avro-confluent)                    | ✔     | ✗      |
 | [Parquet](#data-format-parquet)                                 | ✔     | ✔      |
+| [Arrow](#data-format-arrow)                                     | ✔     | ✔      |
+| [ArrowStream](#data-format-arrow-stream)                        | ✔     | ✔      |
 | [ORC](#data-format-orc)                                         | ✔     | ✗      |
 | [RowBinary](#rowbinary)                                         | ✔     | ✔      |
 | [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes)       | ✔     | ✔      |
@ -985,9 +987,9 @@ See also [how to read/write length-delimited protobuf messages in popular langua

 ## Avro {#data-format-avro}

-[Apache Avro](http://avro.apache.org/) is a row-oriented data serialization framework developed within Apache’s Hadoop project.
+[Apache Avro](https://avro.apache.org/) is a row-oriented data serialization framework developed within Apache’s Hadoop project.

-ClickHouse Avro format supports reading and writing [Avro data files](http://avro.apache.org/docs/current/spec.html#Object+Container+Files).
+ClickHouse Avro format supports reading and writing [Avro data files](https://avro.apache.org/docs/current/spec.html#Object+Container+Files).

 ### Data Types Matching {#data_types-matching}

@ -1009,7 +1011,7 @@ The table below shows supported data types and how they match ClickHouse [data t
 | `long (timestamp-millis)` \*                | [DateTime64(3)](../sql-reference/data-types/datetime.md)                                                              | `long (timestamp-millis)` \* |
 | `long (timestamp-micros)` \*                | [DateTime64(6)](../sql-reference/data-types/datetime.md)                                                              | `long (timestamp-micros)` \* |

-\* [Avro logical types](http://avro.apache.org/docs/current/spec.html#Logical+Types)
+\* [Avro logical types](https://avro.apache.org/docs/current/spec.html#Logical+Types)

 Unsupported Avro data types: `record` (non-root), `map`

@ -1095,7 +1097,7 @@ SELECT * FROM topic1_stream;

 ## Parquet {#data-format-parquet}

-[Apache Parquet](http://parquet.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports read and write operations for this format.
+[Apache Parquet](https://parquet.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. ClickHouse supports read and write operations for this format.

 ### Data Types Matching {#data_types-matching-2}

@ -1141,6 +1143,16 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_

 To exchange data with Hadoop, you can use [HDFS table engine](../engines/table-engines/integrations/hdfs.md).

+## Arrow {data-format-arrow}
+
+[Apache Arrow](https://arrow.apache.org/) comes with two built-in columnar storage formats. ClickHouse supports read and write operations for these formats.
+
+`Arrow` is Apache Arrow's "file mode" format. It is designed for in-memory random access.
+
+## ArrowStream {data-format-arrow-stream}
+
+`ArrowStream` is Apache Arrow's "stream mode" format. It is designed for in-memory stream processing.
+
 ## ORC {#data-format-orc}

 [Apache ORC](https://orc.apache.org/) is a columnar storage format widespread in the Hadoop ecosystem. You can only insert data in this format to ClickHouse.