diff --git a/docs/en/sql-reference/table-functions/file.md b/docs/en/sql-reference/table-functions/file.md index beab691ad0e..d1eb81e52c6 100644 --- a/docs/en/sql-reference/table-functions/file.md +++ b/docs/en/sql-reference/table-functions/file.md @@ -5,7 +5,11 @@ toc_title: file # file {#file} -Creates a table from a file. This table function is similar to [url](../../sql-reference/table-functions/url.md) and [hdfs](../../sql-reference/table-functions/hdfs.md) ones. +Creates a table from a file. This table function is similar to [url](../../sql-reference/table-functions/url.md) and [hdfs](../../sql-reference/table-functions/hdfs.md) ones. + +`file` function can be used in `SELECT` and `INSERT` queries on data in [File](../../engines/table-engines/special/file.md) tables. + +**Syntax** ``` sql file(path, format, structure) @@ -13,15 +17,15 @@ file(path, format, structure) **Input parameters** -- `path` — The relative path to the file from [user_files_path](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-user_files_path). Path to file support following globs in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, \``'abc', 'def'` — strings. +- `path` — The relative path to the file from [user_files_path](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-user_files_path). Path to file support following globs in readonly mode: `*`, `?`, `{abc,def}` and `{N..M}` where `N`, `M` — numbers, `'abc', 'def'` — strings. - `format` — The [format](../../interfaces/formats.md#formats) of the file. -- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`. +- `structure` — Structure of the table. Format: `'column1_name column1_type, column2_name column2_type, ...'`. **Returned value** A table with the specified structure for reading or writing data in the specified file. -**Example** +**Examples** Setting `user_files_path` and the contents of the file `test.csv`: @@ -35,12 +39,29 @@ $ cat /var/lib/clickhouse/user_files/test.csv 78,43,45 ``` -Table from`test.csv` and selection of the first two rows from it: +Getting data from a table in `test.csv` and selecting first two rows from it: ``` sql -SELECT * -FROM file('test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32') -LIMIT 2 +SELECT * FROM file('test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32') LIMIT 2; +``` + +``` text +┌─column1─┬─column2─┬─column3─┐ +│ 1 │ 2 │ 3 │ +│ 3 │ 2 │ 1 │ +└─────────┴─────────┴─────────┘ +``` +Getting the first 10 lines of a table that contains 3 columns of UInt32 type from a CSV file: + +``` sql +SELECT * FROM file('test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32') LIMIT 10; +``` + +Inserting data from a file into a table: + +``` sql +INSERT INTO FUNCTION file('test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32') VALUES (1, 2, 3), (3, 2, 1); +SELECT * FROM file('test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32'); ``` ``` text @@ -50,12 +71,8 @@ LIMIT 2 └─────────┴─────────┴─────────┘ ``` -``` sql --- getting the first 10 lines of a table that contains 3 columns of UInt32 type from a CSV file -SELECT * FROM file('test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32') LIMIT 10 -``` -**Globs in path** +## Globs in Path {#globs-in-path} Multiple path components can have globs. For being processed file should exists and matches to the whole path pattern (not only suffix or prefix). @@ -68,31 +85,25 @@ Constructions with `{}` are similar to the [remote table function](../../sql-ref **Example** -1. Suppose we have several files with the following relative paths: +Suppose we have several files with the following relative paths: -- ‘some_dir/some_file_1’ -- ‘some_dir/some_file_2’ -- ‘some_dir/some_file_3’ -- ‘another_dir/some_file_1’ -- ‘another_dir/some_file_2’ -- ‘another_dir/some_file_3’ +- 'some_dir/some_file_1' +- 'some_dir/some_file_2' +- 'some_dir/some_file_3' +- 'another_dir/some_file_1' +- 'another_dir/some_file_2' +- 'another_dir/some_file_3' -1. Query the amount of rows in these files: - - +Query the amount of rows in these files: ``` sql -SELECT count(*) -FROM file('{some,another}_dir/some_file_{1..3}', 'TSV', 'name String, value UInt32') +SELECT count(*) FROM file('{some,another}_dir/some_file_{1..3}', 'TSV', 'name String, value UInt32'); ``` -1. Query the amount of rows in all files of these two directories: - - +Query the amount of rows in all files of these two directories: ``` sql -SELECT count(*) -FROM file('{some,another}_dir/*', 'TSV', 'name String, value UInt32') +SELECT count(*) FROM file('{some,another}_dir/*', 'TSV', 'name String, value UInt32'); ``` !!! warning "Warning" @@ -103,8 +114,7 @@ FROM file('{some,another}_dir/*', 'TSV', 'name String, value UInt32') Query the data from files named `file000`, `file001`, … , `file999`: ``` sql -SELECT count(*) -FROM file('big_dir/file{0..9}{0..9}{0..9}', 'CSV', 'name String, value UInt32') +SELECT count(*) FROM file('big_dir/file{0..9}{0..9}{0..9}', 'CSV', 'name String, value UInt32'); ``` ## Virtual Columns {#virtual-columns} @@ -116,4 +126,4 @@ FROM file('big_dir/file{0..9}{0..9}{0..9}', 'CSV', 'name String, value UInt32') - [Virtual columns](https://clickhouse.tech/docs/en/operations/table_engines/#table_engines-virtual_columns) -[Original article](https://clickhouse.tech/docs/en/query_language/table_functions/file/) +[Original article](https://clickhouse.tech/docs/en/sql-reference/table-functions/file/) diff --git a/docs/en/sql-reference/table-functions/remote.md b/docs/en/sql-reference/table-functions/remote.md index 71b1006fc5d..8af5b588412 100644 --- a/docs/en/sql-reference/table-functions/remote.md +++ b/docs/en/sql-reference/table-functions/remote.md @@ -5,9 +5,11 @@ toc_title: remote # remote, remoteSecure {#remote-remotesecure} -Allows you to access remote servers without creating a `Distributed` table. +Allows to access remote servers without creating a [Distributed](../../engines/table-engines/special/distributed.md) table. `remoteSecure` - same as `remote` but with secured connection. -Signatures: +Both functions can be used in `SELECT` and `INSERT` queries. + +**Syntax** ``` sql remote('addresses_expr', db, table[, 'user'[, 'password'], sharding_key]) @@ -16,13 +18,40 @@ remoteSecure('addresses_expr', db, table[, 'user'[, 'password'], sharding_key]) remoteSecure('addresses_expr', db.table[, 'user'[, 'password'], sharding_key]) ``` -`addresses_expr` – An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`, or just `host`. The host can be specified as the server name, or as the IPv4 or IPv6 address. An IPv6 address is specified in square brackets. The port is the TCP port on the remote server. If the port is omitted, it uses `tcp_port` from the server’s config file (by default, 9000). -`sharding_key` - We can specify sharding key to support distributing data across nodes. For example: `insert into remote('127.0.0.1:9000,127.0.0.2', db, table, 'default', rand())`. +**Input parameters** + +- `addresses_expr` – An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`, or just `host`. + + The host can be specified as the server name, or as the IPv4 or IPv6 address. An IPv6 address is specified in square brackets. + + The port is the TCP port on the remote server. If the port is omitted, it uses [tcp_port](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port) from the server’s config file in `remote` (by default, 9000) and [tcp_port_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) in `remoteSecure` (by default, 9440). -!!! important "Important" The port is required for an IPv6 address. -Examples: + Type: [String](../../sql-reference/data-types/string.md). + +- `db` - Database name. Type: [String](../../sql-reference/data-types/string.md). +- `table` - Table name. Type: [String](../../sql-reference/data-types/string.md). +- `user` - User name. If the user is not specified, `default` is used. Type: [String](../../sql-reference/data-types/string.md). +- `password` - User password. If the password is not specified, an empty password is used. Type: [String](../../sql-reference/data-types/string.md). +- `sharding_key` - Sharding key to support distributing data across nodes. For example: `insert into remote('127.0.0.1:9000,127.0.0.2', db, table, 'default', rand())`. Type: [UInt32](../../sql-reference/data-types/int-uint.md). + +**Returned value** + +Dataset from remote servers. + +**Usage** + +Using the `remote` table function is less optimal than creating a `Distributed` table, because in this case the server connection is re-established for every request. In addition, if host names are set, the names are resolved, and errors are not counted when working with various replicas. When processing a large number of queries, always create the `Distributed` table ahead of time, and don’t use the `remote` table function. + +The `remote` table function can be useful in the following cases: + +- Accessing a specific server for data comparison, debugging, and testing. +- Queries between various ClickHouse clusters for research purposes. +- Infrequent distributed requests that are made manually. +- Distributed requests where the set of servers is re-defined each time. + +**Adresses** ``` text example01-01-1 @@ -33,9 +62,7 @@ localhost [2a02:6b8:0:1111::11]:9000 ``` -Multiple addresses can be comma-separated. In this case, ClickHouse will use distributed processing, so it will send the query to all specified addresses (like to shards with different data). - -Example: +Multiple addresses can be comma-separated. In this case, ClickHouse will use distributed processing, so it will send the query to all specified addresses (like to shards with different data). Example: ``` text example01-01-1,example01-02-1 @@ -55,30 +82,28 @@ example01-{01..02}-1 If you have multiple pairs of curly brackets, it generates the direct product of the corresponding sets. -Addresses and parts of addresses in curly brackets can be separated by the pipe symbol (\|). In this case, the corresponding sets of addresses are interpreted as replicas, and the query will be sent to the first healthy replica. However, the replicas are iterated in the order currently set in the [load_balancing](../../operations/settings/settings.md) setting. - -Example: +Addresses and parts of addresses in curly brackets can be separated by the pipe symbol (\|). In this case, the corresponding sets of addresses are interpreted as replicas, and the query will be sent to the first healthy replica. However, the replicas are iterated in the order currently set in the [load_balancing](../../operations/settings/settings.md) setting. This example specifies two shards that each have two replicas: ``` text example01-{01..02}-{1|2} ``` -This example specifies two shards that each have two replicas. - The number of addresses generated is limited by a constant. Right now this is 1000 addresses. -Using the `remote` table function is less optimal than creating a `Distributed` table, because in this case, the server connection is re-established for every request. In addition, if host names are set, the names are resolved, and errors are not counted when working with various replicas. When processing a large number of queries, always create the `Distributed` table ahead of time, and don’t use the `remote` table function. +**Examples** -The `remote` table function can be useful in the following cases: +Selecting data from a remote server: -- Accessing a specific server for data comparison, debugging, and testing. -- Queries between various ClickHouse clusters for research purposes. -- Infrequent distributed requests that are made manually. -- Distributed requests where the set of servers is re-defined each time. +``` sql +SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3; +``` -If the user is not specified, `default` is used. -If the password is not specified, an empty password is used. +Inserting data from a remote server into a table: -`remoteSecure` - same as `remote` but with secured connection. Default port — [tcp_port_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) from config or 9440. +``` sql +CREATE TABLE remote_table (name String, value UInt32) ENGINE=Memory; +INSERT INTO FUNCTION remote('127.0.0.1', currentDatabase(), 'remote_table') VALUES ('test', 42); +SELECT * FROM remote_table; +``` -[Original article](https://clickhouse.tech/docs/en/query_language/table_functions/remote/) +[Original article](https://clickhouse.tech/docs/en/sql-reference/table-functions/remote/) diff --git a/docs/en/sql-reference/table-functions/url.md b/docs/en/sql-reference/table-functions/url.md index ea649e01994..d70774b7588 100644 --- a/docs/en/sql-reference/table-functions/url.md +++ b/docs/en/sql-reference/table-functions/url.md @@ -5,20 +5,40 @@ toc_title: url # url {#url} -`url(URL, format, structure)` - returns a table created from the `URL` with given -`format` and `structure`. +`url` function creates a table from the `URL` with given `format` and `structure`. -URL - HTTP or HTTPS server address, which can accept `GET` and/or `POST` requests. +`url` function may be used in `SELECT` and `INSERT` queries on data in [URL](../../engines/table-engines/special/url.md) tables. -format - [format](../../interfaces/formats.md#formats) of the data. - -structure - table structure in `'UserID UInt64, Name String'` format. Determines column names and types. - -**Example** +**Syntax** ``` sql --- getting the first 3 lines of a table that contains columns of String and UInt32 type from HTTP-server which answers in CSV format. -SELECT * FROM url('http://127.0.0.1:12345/', CSV, 'column1 String, column2 UInt32') LIMIT 3 +url(URL, format, structure) ``` -[Original article](https://clickhouse.tech/docs/en/query_language/table_functions/url/) +**Input parameters** + +- `URL` - HTTP or HTTPS server address, which can accept `GET` (for `SELECT`) or `POST` (for `INSERT`) requests. Type: [String](../../sql-reference/data-types/string.md). +- `format` - [Format](../../interfaces/formats.md#formats) of the data. Type: [String](../../sql-reference/data-types/string.md). +- `structure` - Table structure in `'UserID UInt64, Name String'` format. Determines column names and types. Type: [String](../../sql-reference/data-types/string.md). + +**Returned value** + +A table with the specified format and structure and with data from the defined URL. + +**Examples** + +Getting the first 3 lines of a table that contains columns of `String` and `UInt32` type from HTTP-server which answers in `CSV` format. + +``` sql +SELECT * FROM url('http://127.0.0.1:12345/', CSV, 'column1 String, column2 UInt32') LIMIT 3; +``` + +Inserting data from a URL into a table: + +``` sql +CREATE TABLE test_table (column1 String, column2 UInt32) ENGINE=Memory; +INSERT INTO FUNCTION url('http://127.0.0.1:8123/?query=INSERT+INTO+test_table+FORMAT+CSV', 'CSV', 'column1 String, column2 UInt32') VALUES ('http interface', 42); +SELECT * FROM test_table; +``` + +[Original article](https://clickhouse.tech/docs/en/sql-reference/table-functions/url/)