mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-17 21:24:28 +00:00

* split up select.md

* array-join.md basic refactoring

* distinct.md basic refactoring

* format.md basic refactoring

* from.md basic refactoring

* group-by.md basic refactoring

* having.md basic refactoring

* additional index.md refactoring

* into-outfile.md basic refactoring

* join.md basic refactoring

* limit.md basic refactoring

* limit-by.md basic refactoring

* order-by.md basic refactoring

* prewhere.md basic refactoring

* adjust operators/index.md links

* adjust sample.md links

* adjust more links

* adjust operatots links

* fix some links

* adjust aggregate function article titles

* basic refactor of remaining select clauses

* absolute paths in make_links.sh

* run make_links.sh

* remove old select.md locations

* translate docs/es

* translate docs/fr

* translate docs/fa

* remove old operators.md location

* change operators.md links

* adjust links in docs/es

* adjust links in docs/es

* minor texts adjustments

* wip

* update machine translations to use new links

* fix changelog

* es build fixes

* get rid of some select.md links

* temporary adjust ru links

* temporary adjust more ru links

* improve curly brace handling

* adjust ru as well

* fa build fix

* ru link fixes

* zh link fixes

* temporary disable part of anchor checks

2020-05-15 07:34:54 +03:00

4.3 KiB

Raw Blame History

machine_translated	machine_translated_rev	toc_priority	toc_title
true	`72537a2d52`	36	HDFS

HDFS

Este motor proporciona integración con Acerca de nosotros permitiendo gestionar datos sobre HDFSa través de ClickHouse. Este motor es similar a la File y URL motores, pero proporciona características específicas de Hadoop.

Uso

ENGINE = HDFS(URI, format)

El URI El parámetro es el URI del archivo completo en HDFS. El format parámetro especifica uno de los formatos de archivo disponibles. Realizar SELECT consultas, el formato debe ser compatible para la entrada, y para realizar INSERT queries – for output. The available formats are listed in the Formato apartado. La parte de la ruta de URI puede contener globs. En este caso, la tabla sería de solo lectura.

Ejemplo:

1. Configurar el hdfs_engine_table tabla:

CREATE TABLE hdfs_engine_table (name String, value UInt32) ENGINE=HDFS('hdfs://hdfs1:9000/other_storage', 'TSV')

2. Llenar archivo:

INSERT INTO hdfs_engine_table VALUES ('one', 1), ('two', 2), ('three', 3)

3. Consultar los datos:

SELECT * FROM hdfs_engine_table LIMIT 2

┌─name─┬─value─┐
│ one  │     1 │
│ two  │     2 │
└──────┴───────┘

Detalles de implementación

Las lecturas y escrituras pueden ser paralelas
No soportado:
- ALTER y SELECT...SAMPLE operación.
- Índices.
- Replicación.

Globs en el camino

Múltiples componentes de ruta de acceso pueden tener globs. Para ser procesado, el archivo debe existir y coincidir con todo el patrón de ruta. Listado de archivos determina durante SELECT (no en CREATE momento).

* — Substitutes any number of any characters except / incluyendo cadena vacía.
? — Substitutes any single character.
{some_string,another_string,yet_another_one} — Substitutes any of strings 'some_string', 'another_string', 'yet_another_one'.
{N..M} — Substitutes any number in range from N to M including both borders.

Construcciones con {} son similares a la remoto función de la tabla.

Ejemplo

Supongamos que tenemos varios archivos en formato TSV con los siguientes URI en HDFS:

‘hdfs://hdfs1:9000/some_dir/some_file_1’
‘hdfs://hdfs1:9000/some_dir/some_file_2’
‘hdfs://hdfs1:9000/some_dir/some_file_3’
‘hdfs://hdfs1:9000/another_dir/some_file_1’
‘hdfs://hdfs1:9000/another_dir/some_file_2’
‘hdfs://hdfs1:9000/another_dir/some_file_3’

Hay varias maneras de hacer una tabla que consta de los seis archivos:

CREATE TABLE table_with_range (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_{1..3}', 'TSV')

Otra forma:

CREATE TABLE table_with_question_mark (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_?', 'TSV')

La tabla consta de todos los archivos en ambos directorios (todos los archivos deben satisfacer el formato y el esquema descritos en la consulta):

CREATE TABLE table_with_asterisk (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV')

!!! warning "Advertencia" Si la lista de archivos contiene rangos de números con ceros a la izquierda, use la construcción con llaves para cada dígito por separado o use ?.

Ejemplo

Crear tabla con archivos llamados file000, file001, … , file999:

CREARE TABLE big_table (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/big_dir/file{0..9}{0..9}{0..9}', 'CSV')

Virtual Columnas

_path — Path to the file.
_file — Name of the file.

Ver también

Virtual columnas

Artículo Original

4.3 KiB Raw Blame History Unescape Escape

HDFS

Uso

Detalles de implementación

Virtual Columnas

4.3 KiB

Raw Blame History