ClickHouse/docs/en/engines/table-engines/integrations/hdfs.md

---
toc_priority: 6
toc_title: HDFS
---

# HDFS {#table_engines-hdfs}

This engine provides integration with [Apache Hadoop](https://en.wikipedia.org/wiki/Apache_Hadoop) ecosystem by allowing to manage data on [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) via ClickHouse. This engine is similar
to the [File](../../../engines/table-engines/special/file.md#table_engines-file) and [URL](../../../engines/table-engines/special/url.md#table_engines-url) engines, but provides Hadoop-specific features.

## Usage {#usage}

``` sql
ENGINE = HDFS(URI, format)
```

The `URI` parameter is the whole file URI in HDFS.
The `format` parameter specifies one of the available file formats. To perform
`SELECT` queries, the format must be supported for input, and to perform
`INSERT` queries – for output. The available formats are listed in the
[Formats](../../../interfaces/formats.md#formats) section.
The path part of `URI` may contain globs. In this case the table would be readonly.

**Example:**

**1.** Set up the `hdfs_engine_table` table:

``` sql
CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://hdfs1:9000/other_storage', 'TSV')
```

**2.** Fill file:

``` sql
INSERT INTO hdfs_engine_table VALUES ('one', 1), ('two', 2), ('three', 3)
```

**3.** Query the data:

``` sql
SELECT * FROM hdfs_engine_table LIMIT 2
```

``` text
┌─name─┬─value─┐
│ one  │     1 │
│ two  │     2 │
└──────┴───────┘
```

## Implementation Details {#implementation-details}

-   Reads and writes can be parallel.
-   Zero-copy replication is supported, which means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.  
-   Not supported:
    -   `ALTER` and `SELECT...SAMPLE` operations.
    -   Indexes.

**Globs in path**

Multiple path components can have globs. For being processed file should exists and matches to the whole path pattern. Listing of files determines during `SELECT` (not at `CREATE` moment).

-   `*` — Substitutes any number of any characters except `/` including empty string.
-   `?` — Substitutes any single character.
-   `{some_string,another_string,yet_another_one}` — Substitutes any of strings `'some_string', 'another_string', 'yet_another_one'`.
-   `{N..M}` — Substitutes any number in range from N to M including both borders.

Constructions with `{}` are similar to the [remote](../../../sql-reference/table-functions/remote.md) table function.

**Example**

1.  Suppose we have several files in TSV format with the following URIs on HDFS:

-   ‘hdfs://hdfs1:9000/some_dir/some_file_1’
-   ‘hdfs://hdfs1:9000/some_dir/some_file_2’
-   ‘hdfs://hdfs1:9000/some_dir/some_file_3’
-   ‘hdfs://hdfs1:9000/another_dir/some_file_1’
-   ‘hdfs://hdfs1:9000/another_dir/some_file_2’
-   ‘hdfs://hdfs1:9000/another_dir/some_file_3’

1.  There are several ways to make a table consisting of all six files:

<!-- -->

``` sql
CREATE TABLE table_with_range (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_{1..3}', 'TSV')
```

Another way:

``` sql
CREATE TABLE table_with_question_mark (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_?', 'TSV')
```

Table consists of all the files in both directories (all files should satisfy format and schema described in query):

``` sql
CREATE TABLE table_with_asterisk (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV')
```

!!! warning "Warning"
    If the listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.

**Example**

Create table with files named `file000`, `file001`, … , `file999`:

``` sql
CREATE TABLE big_table (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/big_dir/file{0..9}{0..9}{0..9}', 'CSV')
```
## Configuration {#configuration}

Similar to GraphiteMergeTree, the HDFS engine supports extended configuration using the ClickHouse config file. There are two configuration keys that you can use: global (`hdfs`) and user-level (`hdfs_*`). The global configuration is applied first, and then the user-level configuration is applied (if it exists).

``` xml
  <!-- Global configuration options for HDFS engine type -->
  <hdfs>
	<hadoop_kerberos_keytab>/tmp/keytab/clickhouse.keytab</hadoop_kerberos_keytab>
	<hadoop_kerberos_principal>clickuser@TEST.CLICKHOUSE.TECH</hadoop_kerberos_principal>
	<hadoop_security_authentication>kerberos</hadoop_security_authentication>
  </hdfs>

  <!-- Configuration specific for user "root" -->
  <hdfs_root>
	<hadoop_kerberos_principal>root@TEST.CLICKHOUSE.TECH</hadoop_kerberos_principal>
  </hdfs_root>
```

### Configuration Options {#configuration-options}

#### Supported by libhdfs3 {#supported-by-libhdfs3}


| **parameter**                                         | **default value**       |
| rpc\_client\_connect\_tcpnodelay                      | true                    |
| dfs\_client\_read\_shortcircuit                       | true                    |
| output\_replace-datanode-on-failure                   | true                    |
| input\_notretry-another-node                          | false                   |
| input\_localread\_mappedfile                          | true                    |
| dfs\_client\_use\_legacy\_blockreader\_local          | false                   |
| rpc\_client\_ping\_interval                           | 10  * 1000              |
| rpc\_client\_connect\_timeout                         | 600 * 1000              |
| rpc\_client\_read\_timeout                            | 3600 * 1000             |
| rpc\_client\_write\_timeout                           | 3600 * 1000             |
| rpc\_client\_socekt\_linger\_timeout                  | -1                      |
| rpc\_client\_connect\_retry                           | 10                      |
| rpc\_client\_timeout                                  | 3600 * 1000             |
| dfs\_default\_replica                                 | 3                       |
| input\_connect\_timeout                               | 600 * 1000              |
| input\_read\_timeout                                  | 3600 * 1000             |
| input\_write\_timeout                                 | 3600 * 1000             |
| input\_localread\_default\_buffersize                 | 1 * 1024 * 1024         |
| dfs\_prefetchsize                                     | 10                      |
| input\_read\_getblockinfo\_retry                      | 3                       |
| input\_localread\_blockinfo\_cachesize                | 1000                    |
| input\_read\_max\_retry                               | 60                      |
| output\_default\_chunksize                            | 512                     |
| output\_default\_packetsize                           | 64 * 1024               |
| output\_default\_write\_retry                         | 10                      |
| output\_connect\_timeout                              | 600 * 1000              |
| output\_read\_timeout                                 | 3600 * 1000             |
| output\_write\_timeout                                | 3600 * 1000             |
| output\_close\_timeout                                | 3600 * 1000             |
| output\_packetpool\_size                              | 1024                    |
| output\_heeartbeat\_interval                          | 10 * 1000               |
| dfs\_client\_failover\_max\_attempts                  | 15                      |
| dfs\_client\_read\_shortcircuit\_streams\_cache\_size | 256                     |
| dfs\_client\_socketcache\_expiryMsec                  | 3000                    |
| dfs\_client\_socketcache\_capacity                    | 16                      |
| dfs\_default\_blocksize                               | 64 * 1024 * 1024        |
| dfs\_default\_uri                                     | "hdfs://localhost:9000" |
| hadoop\_security\_authentication                      | "simple"                |
| hadoop\_security\_kerberos\_ticket\_cache\_path       | ""                      |
| dfs\_client\_log\_severity                            | "INFO"                  |
| dfs\_domain\_socket\_path                             | ""                      |


[HDFS Configuration Reference](https://hawq.apache.org/docs/userguide/2.3.0.0-incubating/reference/HDFSConfigurationParameterReference.html) might explain some parameters.


#### ClickHouse extras {#clickhouse-extras}

| **parameter**                                         | **default value**       |
|hadoop\_kerberos\_keytab                               | ""                      |
|hadoop\_kerberos\_principal                            | ""                      |
|hadoop\_kerberos\_kinit\_command                       | kinit                   |

#### Limitations {#limitations}
  * hadoop\_security\_kerberos\_ticket\_cache\_path can be global only, not user specific

## Kerberos support {#kerberos-support}

If hadoop\_security\_authentication parameter has value 'kerberos', ClickHouse authentifies via Kerberos facility.
Parameters [here](#clickhouse-extras) and hadoop\_security\_kerberos\_ticket\_cache\_path may be of help.
Note that due to libhdfs3 limitations only old-fashioned approach is supported,
datanode communications are not secured by SASL (HADOOP\_SECURE\_DN\_USER is a reliable indicator of such
security approach). Use tests/integration/test\_storage\_kerberized\_hdfs/hdfs_configs/bootstrap.sh for reference.

If hadoop\_kerberos\_keytab, hadoop\_kerberos\_principal or hadoop\_kerberos\_kinit\_command is specified, kinit will be invoked. hadoop\_kerberos\_keytab and hadoop\_kerberos\_principal are mandatory in this case. kinit tool and krb5 configuration files are required.

## Virtual Columns {#virtual-columns}

-   `_path` — Path to the file.
-   `_file` — Name of the file.

**See Also**

-   [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns)

[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/hdfs/) <!--hide-->
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								---
-												Table engine and function

											
										
										
											2021-02-27 00:01:02 +00:00
+								toc_priority: 6
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								toc_title: HDFS
 								---
-												Restore some old manual anchors in docs (#9803)

* Simplify 404 page

* add es array_functions.md

* restore some old manual anchors

* update sitemaps

* trigger checks

* restore more old manual anchors

* refactor test.md + temporary disable failure again

* fix mistype
											
										
										
											2020-03-22 09:14:59 +00:00
+								# HDFS {#table_engines-hdfs}
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
-												try to fix "fake" nowhere links according to https://github.com/ClickHouse/ClickHouse/pull/21268#issuecomment-787106299

											
										
										
											2021-03-04 12:08:35 +00:00
+								This engine provides integration with [Apache Hadoop](https://en.wikipedia.org/wiki/Apache_Hadoop) ecosystem by allowing to manage data on [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) via ClickHouse. This engine is similar
-												[docs] split aggregate function and system table references (#11742)

* prefer relative links from root

* wip

* split aggregate function reference

* split system tables
											
										
										
											2020-06-18 08:24:31 +00:00
+								to the [File](../../../engines/table-engines/special/file.md#table_engines-file) and [URL](../../../engines/table-engines/special/url.md#table_engines-url) engines, but provides Hadoop-specific features.
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Usage {#usage}
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
+								ENGINE = HDFS(URI, format)
 								```
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
-												Improvement

											
										
										
											2019-09-04 19:55:56 +00:00
+								The `URI` parameter is the whole file URI in HDFS.
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
+								The `format` parameter specifies one of the available file formats. To perform
 								`SELECT` queries, the format must be supported for input, and to perform
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								`INSERT` queries – for output. The available formats are listed in the
-												Get rid of toc_en.yml (#10023)


											
										
										
											2020-04-03 13:23:32 +00:00
+								[Formats](../../../interfaces/formats.md#formats) section.
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
+								The path part of `URI` may contain globs. In this case the table would be readonly.
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
 								**Example:**
-												Update docs/en/operations/table_engines/hdfs.md

Co-Authored-By: Ivan Blinkov <github@blinkov.ru>
											
										
										
											2019-09-04 13:26:52 +00:00
+								**1.** Set up the `hdfs_engine_table` table:
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
+								CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://hdfs1:9000/other_storage', 'TSV')
 								```
-												Improvement

											
										
										
											2019-09-04 19:55:56 +00:00
 								**2.** Fill file:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
 								``` sql
-												Improvement

											
										
										
											2019-09-04 19:55:56 +00:00
+								INSERT INTO hdfs_engine_table VALUES ('one', 1), ('two', 2), ('three', 3)
 								```
 								**3.** Query the data:
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
+								SELECT * FROM hdfs_engine_table LIMIT 2
 								```
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` text
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
+								┌─name─┬─value─┐
 								│ one  │     1 │
 								│ two  │     2 │
 								└──────┴───────┘
 								```
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Implementation Details {#implementation-details}
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
-												Docs for HDFS

											
										
										
											2021-08-01 02:55:24 +00:00
+								-   Reads and writes can be parallel.
 								-   Zero-copy replication is supported, which means that if the data is stored remotely on several machines and needs to be synchronized, then only the metadata is replicated (paths to the data parts), but not the data itself.
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   Not supported:
 								    -   `ALTER` and `SELECT...SAMPLE` operations.
 								    -   Indexes.
-												Add docs for hdfs and fix some review comments

											
										
										
											2019-09-03 14:23:51 +00:00
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
+								**Globs in path**
 								Multiple path components can have globs. For being processed file should exists and matches to the whole path pattern. Listing of files determines during `SELECT` (not at `CREATE` moment).
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   `*` — Substitutes any number of any characters except `/` including empty string.
 								-   `?` — Substitutes any single character.
 								-   `{some_string,another_string,yet_another_one}` — Substitutes any of strings `'some_string', 'another_string', 'yet_another_one'`.
 								-   `{N..M}` — Substitutes any number in range from N to M including both borders.
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
-												[docs] replace underscores with hyphens (#10606)

* Replace underscores with hyphens

* remove temporary code

* fix style check

* fix collapse
											
										
										
											2020-04-30 18:19:18 +00:00
+								Constructions with `{}` are similar to the [remote](../../../sql-reference/table-functions/remote.md) table function.
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
 								**Example**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+.  Suppose we have several files in TSV format with the following URIs on HDFS:
-												Fix broken links in docs

											
										
										
											2020-10-13 17:23:29 +00:00
+								-   ‘hdfs://hdfs1:9000/some_dir/some_file_1’
 								-   ‘hdfs://hdfs1:9000/some_dir/some_file_2’
 								-   ‘hdfs://hdfs1:9000/some_dir/some_file_3’
 								-   ‘hdfs://hdfs1:9000/another_dir/some_file_1’
 								-   ‘hdfs://hdfs1:9000/another_dir/some_file_2’
 								-   ‘hdfs://hdfs1:9000/another_dir/some_file_3’
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+.  There are several ways to make a table consisting of all six files:
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								<!-- -->
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
+								CREATE TABLE table_with_range (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_{1..3}', 'TSV')
 								```
 								Another way:
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
+								CREATE TABLE table_with_question_mark (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_?', 'TSV')
 								```
 								Table consists of all the files in both directories (all files should satisfy format and schema described in query):
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
+								CREATE TABLE table_with_asterisk (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV')
 								```
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								!!! warning "Warning"
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
+								    If the listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
 								**Example**
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								Create table with files named `file000`, `file001`, … , `file999`:
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								``` sql
-												Update hdfs.md
											
										
										
											2020-08-28 07:56:44 +00:00
+								CREATE TABLE big_table (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/big_dir/file{0..9}{0..9}{0..9}', 'CSV')
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
+								```
-												HADOOP_SECURE_DN_USER way, kinit thread, docker capabilities

											
										
										
											2020-09-28 17:20:04 +00:00
+								## Configuration {#configuration}
 								Similar to GraphiteMergeTree, the HDFS engine supports extended configuration using the ClickHouse config file. There are two configuration keys that you can use: global (`hdfs`) and user-level (`hdfs_*`). The global configuration is applied first, and then the user-level configuration is applied (if it exists).
 								``` xml
 								  <!-- Global configuration options for HDFS engine type -->
 								  <hdfs>
 									<hadoop_kerberos_keytab>/tmp/keytab/clickhouse.keytab</hadoop_kerberos_keytab>
 									<hadoop_kerberos_principal>clickuser@TEST.CLICKHOUSE.TECH</hadoop_kerberos_principal>
 									<hadoop_security_authentication>kerberos</hadoop_security_authentication>
 								  </hdfs>
 								  <!-- Configuration specific for user "root" -->
 								  <hdfs_root>
 									<hadoop_kerberos_principal>root@TEST.CLICKHOUSE.TECH</hadoop_kerberos_principal>
 								  </hdfs_root>
 								```
-												Docs for HDFS

											
										
										
											2021-08-01 02:55:24 +00:00
+								### Configuration Options {#configuration-options}
 								#### Supported by libhdfs3 {#supported-by-libhdfs3}
-												HADOOP_SECURE_DN_USER way, kinit thread, docker capabilities

											
										
										
											2020-09-28 17:20:04 +00:00
 								| **parameter**                                         | **default value**       |
 								| rpc\_client\_connect\_tcpnodelay                      | true                    |
 								| dfs\_client\_read\_shortcircuit                       | true                    |
 								| output\_replace-datanode-on-failure                   | true                    |
 								| input\_notretry-another-node                          | false                   |
 								| input\_localread\_mappedfile                          | true                    |
 								| dfs\_client\_use\_legacy\_blockreader\_local          | false                   |
 								| rpc\_client\_ping\_interval                           | 10  * 1000              |
 								| rpc\_client\_connect\_timeout                         | 600 * 1000              |
 								| rpc\_client\_read\_timeout                            | 3600 * 1000             |
 								| rpc\_client\_write\_timeout                           | 3600 * 1000             |
 								| rpc\_client\_socekt\_linger\_timeout                  | -1                      |
 								| rpc\_client\_connect\_retry                           | 10                      |
 								| rpc\_client\_timeout                                  | 3600 * 1000             |
 								| dfs\_default\_replica                                 | 3                       |
 								| input\_connect\_timeout                               | 600 * 1000              |
 								| input\_read\_timeout                                  | 3600 * 1000             |
 								| input\_write\_timeout                                 | 3600 * 1000             |
 								| input\_localread\_default\_buffersize                 | 1 * 1024 * 1024         |
 								| dfs\_prefetchsize                                     | 10                      |
 								| input\_read\_getblockinfo\_retry                      | 3                       |
 								| input\_localread\_blockinfo\_cachesize                | 1000                    |
 								| input\_read\_max\_retry                               | 60                      |
 								| output\_default\_chunksize                            | 512                     |
 								| output\_default\_packetsize                           | 64 * 1024               |
 								| output\_default\_write\_retry                         | 10                      |
 								| output\_connect\_timeout                              | 600 * 1000              |
 								| output\_read\_timeout                                 | 3600 * 1000             |
 								| output\_write\_timeout                                | 3600 * 1000             |
 								| output\_close\_timeout                                | 3600 * 1000             |
 								| output\_packetpool\_size                              | 1024                    |
 								| output\_heeartbeat\_interval                          | 10 * 1000               |
 								| dfs\_client\_failover\_max\_attempts                  | 15                      |
 								| dfs\_client\_read\_shortcircuit\_streams\_cache\_size | 256                     |
 								| dfs\_client\_socketcache\_expiryMsec                  | 3000                    |
 								| dfs\_client\_socketcache\_capacity                    | 16                      |
 								| dfs\_default\_blocksize                               | 64 * 1024 * 1024        |
 								| dfs\_default\_uri                                     | "hdfs://localhost:9000" |
 								| hadoop\_security\_authentication                      | "simple"                |
 								| hadoop\_security\_kerberos\_ticket\_cache\_path       | ""                      |
 								| dfs\_client\_log\_severity                            | "INFO"                  |
 								| dfs\_domain\_socket\_path                             | ""                      |
-												try to fix "fake" nowhere links according to https://github.com/ClickHouse/ClickHouse/pull/21268#issuecomment-787106299

											
										
										
											2021-03-04 12:08:35 +00:00
+								[HDFS Configuration Reference](https://hawq.apache.org/docs/userguide/2.3.0.0-incubating/reference/HDFSConfigurationParameterReference.html) might explain some parameters.
-												HADOOP_SECURE_DN_USER way, kinit thread, docker capabilities

											
										
										
											2020-09-28 17:20:04 +00:00
 								#### ClickHouse extras {#clickhouse-extras}
-												style fix per code review, doc improvement, params consistency check

											
										
										
											2020-12-10 21:52:05 +00:00
+								| **parameter**                                         | **default value**       |
 								|hadoop\_kerberos\_keytab                               | ""                      |
 								|hadoop\_kerberos\_principal                            | ""                      |
 								|hadoop\_kerberos\_kinit\_command                       | kinit                   |
-												HADOOP_SECURE_DN_USER way, kinit thread, docker capabilities

											
										
										
											2020-09-28 17:20:04 +00:00
-												cleanup, fixes, new submodules, ShellCommand, WriteBufferFromString

											
										
										
											2020-10-30 19:40:16 +00:00
+								#### Limitations {#limitations}
 								  * hadoop\_security\_kerberos\_ticket\_cache\_path can be global only, not user specific
-												HADOOP_SECURE_DN_USER way, kinit thread, docker capabilities

											
										
										
											2020-09-28 17:20:04 +00:00
+								## Kerberos support {#kerberos-support}
 								If hadoop\_security\_authentication parameter has value 'kerberos', ClickHouse authentifies via Kerberos facility.
 								Parameters [here](#clickhouse-extras) and hadoop\_security\_kerberos\_ticket\_cache\_path may be of help.
 								Note that due to libhdfs3 limitations only old-fashioned approach is supported,
-												cleanup, fixes, new submodules, ShellCommand, WriteBufferFromString

											
										
										
											2020-10-30 19:40:16 +00:00
+								datanode communications are not secured by SASL (HADOOP\_SECURE\_DN\_USER is a reliable indicator of such
 								security approach). Use tests/integration/test\_storage\_kerberized\_hdfs/hdfs_configs/bootstrap.sh for reference.
-												cleanup hdfs docs

											
										
										
											2019-09-20 11:26:00 +00:00
-												style fix per code review, doc improvement, params consistency check

											
										
										
											2020-12-10 21:52:05 +00:00
+								If hadoop\_kerberos\_keytab, hadoop\_kerberos\_principal or hadoop\_kerberos\_kinit\_command is specified, kinit will be invoked. hadoop\_kerberos\_keytab and hadoop\_kerberos\_principal are mandatory in this case. kinit tool and krb5 configuration files are required.
-												doc minor changes, cleanup, krb5-user as a recommended package

											
										
										
											2020-11-18 21:08:17 +00:00
-												Normalization for en markdown (#9763)


											
										
										
											2020-03-20 10:10:48 +00:00
+								## Virtual Columns {#virtual-columns}
-												Add virtual columns to hdfs and file table functions (#8489)

* Add virtual column _path to hdfs and file table functions with test

* Fix const of headers

* Add column _file with tests

* Add docs

* Fix improper resolve conflicts

* Fix links in docs

* Better condition for virtual columns proccessing in StorageFile

* better condition for virtual columns processing in StorageHDFS

											
										
										
											2020-01-15 07:52:45 +00:00
-												[experimental] add "es" docs language as machine translated draft (#9787)

* replace exit with assert in test_single_page

* improve save_raw_single_page docs option

* More grammar fixes

* "Built from" link in new tab

* fix mistype

* Example of include in docs

* add anchor to meeting form

* Draft of translation helper

* WIP on translation helper

* Replace some fa docs content with machine translation

* add normalize-en-markdown.sh

* normalize some en markdown

* normalize some en markdown

* admonition support

* normalize

* normalize

* normalize

* support wide tables

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* normalize

* lightly edited machine translation of introdpection.md

* lightly edited machhine translation of lazy.md

* WIP on translation utils

* Normalize ru docs

* Normalize other languages

* some fixes

* WIP on normalize/translate tools

* add requirements.txt

* [experimental] add es docs language as machine translated draft

* remove duplicate script

* Back to wider tab-stop (narrow renders not so well)
											
										
										
											2020-03-21 04:11:51 +00:00
+								-   `_path` — Path to the file.
 								-   `_file` — Name of the file.
-												Add virtual columns to hdfs and file table functions (#8489)

* Add virtual column _path to hdfs and file table functions with test

* Fix const of headers

* Add column _file with tests

* Add docs

* Fix improper resolve conflicts

* Fix links in docs

* Better condition for virtual columns proccessing in StorageFile

* better condition for virtual columns processing in StorageHDFS

											
										
										
											2020-01-15 07:52:45 +00:00
 								**See Also**
-												[docs] split aggregate function and system table references (#11742)

* prefer relative links from root

* wip

* split aggregate function reference

* split system tables
											
										
										
											2020-06-18 08:24:31 +00:00
+								-   [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns)
-												Add virtual columns to hdfs and file table functions (#8489)

* Add virtual column _path to hdfs and file table functions with test

* Fix const of headers

* Add column _file with tests

* Add docs

* Fix improper resolve conflicts

* Fix links in docs

* Better condition for virtual columns proccessing in StorageFile

* better condition for virtual columns processing in StorageHDFS

											
										
										
											2020-01-15 07:52:45 +00:00
-												try to fix "fake" nowhere links according to https://github.com/ClickHouse/ClickHouse/pull/21268#issuecomment-787106299

											
										
										
											2021-03-04 12:08:35 +00:00
+								[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/hdfs/) <!--hide-->