ClickHouse/docs/en/sql-reference/table-functions/url.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

80 lines
3.8 KiB
Markdown
Raw Permalink Normal View History

2020-04-03 13:23:32 +00:00
---
2022-08-28 14:53:34 +00:00
slug: /en/sql-reference/table-functions/url
sidebar_position: 200
sidebar_label: url
2020-04-03 13:23:32 +00:00
---
2022-06-02 10:55:18 +00:00
# url
2021-01-19 22:39:12 +00:00
`url` function creates a table from the `URL` with given `format` and `structure`.
2021-01-19 23:02:46 +00:00
`url` function may be used in `SELECT` and `INSERT` queries on data in [URL](../../engines/table-engines/special/url.md) tables.
2021-01-19 22:39:12 +00:00
**Syntax**
2021-01-19 22:39:12 +00:00
``` sql
2023-05-22 20:00:38 +00:00
url(URL [,format] [,structure] [,headers])
2021-01-19 22:39:12 +00:00
```
**Parameters**
2021-01-19 22:39:12 +00:00
- `URL` — HTTP or HTTPS server address, which can accept `GET` or `POST` requests (for `SELECT` or `INSERT` queries correspondingly). Type: [String](../../sql-reference/data-types/string.md).
- `format` — [Format](../../interfaces/formats.md#formats) of the data. Type: [String](../../sql-reference/data-types/string.md).
- `structure` — Table structure in `'UserID UInt64, Name String'` format. Determines column names and types. Type: [String](../../sql-reference/data-types/string.md).
2023-05-22 20:00:38 +00:00
- `headers` - Headers in `'headers('key1'='value1', 'key2'='value2')'` format. You can set headers for HTTP call.
2021-01-19 22:39:12 +00:00
**Returned value**
A table with the specified format and structure and with data from the defined `URL`.
2021-01-19 22:39:12 +00:00
**Examples**
2021-03-14 12:29:23 +00:00
Getting the first 3 lines of a table that contains columns of `String` and [UInt32](../../sql-reference/data-types/int-uint.md) type from HTTP-server which answers in [CSV](../../interfaces/formats.md#csv) format.
2021-01-19 22:39:12 +00:00
``` sql
2023-05-22 20:00:38 +00:00
SELECT * FROM url('http://127.0.0.1:12345/', CSV, 'column1 String, column2 UInt32', headers('Accept'='text/csv; charset=utf-8')) LIMIT 3;
2021-01-19 22:39:12 +00:00
```
Inserting data from a `URL` into a table:
2020-03-20 10:10:48 +00:00
``` sql
CREATE TABLE test_table (column1 String, column2 UInt32) ENGINE=Memory;
INSERT INTO FUNCTION url('http://127.0.0.1:8123/?query=INSERT+INTO+test_table+FORMAT+CSV', 'CSV', 'column1 String, column2 UInt32') VALUES ('http interface', 42);
SELECT * FROM test_table;
```
2022-07-08 13:56:21 +00:00
## Globs in URL
2021-07-29 15:20:55 +00:00
Patterns in curly brackets `{ }` are used to generate a set of shards or to specify failover addresses. Supported pattern types and examples see in the description of the [remote](remote.md#globs-in-addresses) function.
Character `|` inside patterns is used to specify failover addresses. They are iterated in the same order as listed in the pattern. The number of generated addresses is limited by [glob_expansion_max_elements](../../operations/settings/settings.md#glob_expansion_max_elements) setting.
## Virtual Columns
2023-11-22 18:21:30 +00:00
- `_path` — Path to the `URL`. Type: `LowCardinalty(String)`.
- `_file` — Resource name of the `URL`. Type: `LowCardinalty(String)`.
- `_size` — Size of the resource in bytes. Type: `Nullable(UInt64)`. If the size is unknown, the value is `NULL`.
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
2024-08-25 08:00:16 +00:00
- `_headers` - HTTP response headers. Type: `Map(LowCardinality(String), LowCardinality(String))`.
2024-07-15 16:27:38 +00:00
## Hive-style partitioning {#hive-style-partitioning}
When setting `use_hive_partitioning` is set to 1, ClickHouse will detect Hive-style partitioning in the path (`/name=value/`) and will allow to use partition columns as virtual columns in the query. These virtual columns will have the same names as in the partitioned path, but starting with `_`.
**Example**
Use virtual column, created with Hive-style partitioning
``` sql
SET use_hive_partitioning = 1;
SELECT * from url('http://data/path/date=*/country=*/code=*/*.parquet') where _date > '2020-01-01' and _country = 'Netherlands' and _code = 42;
2024-07-15 16:27:38 +00:00
```
2023-05-31 17:52:29 +00:00
## Storage Settings {#storage-settings}
- [engine_url_skip_empty_files](/docs/en/operations/settings/settings.md#engine_url_skip_empty_files) - allows to skip empty files while reading. Disabled by default.
- [enable_url_encoding](/docs/en/operations/settings/settings.md#enable_url_encoding) - allows to enable/disable decoding/encoding path in uri. Enabled by default.
2023-05-31 17:52:29 +00:00
2023-07-20 12:38:41 +00:00
**See Also**
- [Virtual columns](/docs/en/engines/table-engines/index.md#table_engines-virtual_columns)