[WIP] Add documentation for dictionaries DDL (#7720)

This commit is contained in:
alesapin 2019-11-13 18:50:09 +03:00 committed by Ivan Blinkov
parent 30acb1b653
commit b26ac93731
10 changed files with 404 additions and 35 deletions

View File

@ -271,3 +271,27 @@ Views look the same as normal tables. For example, they are listed in the result
There isn't a separate query for deleting views. To delete a view, use `DROP TABLE`.
[Original article](https://clickhouse.yandex/docs/en/query_language/create/) <!--hide-->
## CREATE DICTIONARY {#create-dictionary-query}
```sql
CREATE DICTIONARY [IF NOT EXISTS] [db.]dictionary_name
(
key1 type1 [DEFAULT|EXPRESSION expr1] [HIERARCHICAL|INJECTIVE|IS_OBJECT_ID],
key2 type2 [DEFAULT|EXPRESSION expr2] [HIERARCHICAL|INJECTIVE|IS_OBJECT_ID],
attr1 type2 [DEFAULT|EXPRESSION expr3],
attr2 type2 [DEFAULT|EXPRESSION expr4]
)
PRIMARY KEY key1, key2
SOURCE(SOURCE_NAME([param1 value1 ... paramN valueN]))
LAYOUT(LAYOUT_NAME([param_name param_value]))
LIFETIME([MIN val1] MAX val2)
```
Creates [external dictionary](dicts/external_dicts.md) with given [structure](dicts/external_dicts_dict_structure.md), [source](dicts/external_dicts_dict_sources.md), [layout](dicts/external_dicts_dict_layout.md) and [lifetime](dicts/external_dicts_dict_lifetime.md).
External dictionary structure consists of attributes. Dictionary attributes are specified similarly to table columns. The only required attribute property is its type, all other properties may have default values.
Depending on dictionary [layout](dicts/external_dicts_dict_layout.md) one or more attributes can be specified as dictionary keys.
For more information, see [External Dictionaries](dicts/external_dicts.md) section.

View File

@ -4,10 +4,11 @@ You can add your own dictionaries from various data sources. The data source for
ClickHouse:
> - Fully or partially stores dictionaries in RAM.
- Fully or partially stores dictionaries in RAM.
- Periodically updates dictionaries and dynamically loads missing values. In other words, dictionaries can be loaded dynamically.
- Allows to create external dictionaries with xml-files or [DDL queries](../create.md#create-dictionary-query).
The configuration of external dictionaries is located in one or more files. The path to the configuration is specified in the [dictionaries_config](../../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.
The configuration of external dictionaries can be located in one or more xml-files. The path to the configuration is specified in the [dictionaries_config](../../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.
Dictionaries can be loaded at server startup or at first use, depending on the [dictionaries_lazy_load](../../operations/server_settings/settings.md#server_settings-dictionaries_lazy_load) setting.
@ -31,6 +32,8 @@ The dictionary configuration file has the following format:
You can [configure](external_dicts_dict.md) any number of dictionaries in the same file.
[DDL queries for dictionaries](../create.md#create-dictionary-query) doesn't require any additional records in server configuration. They allow to work with dictionaries as first-class entities, like tables or views.
!!! attention
You can convert values for a small dictionary by describing it in a `SELECT` query (see the [transform](../functions/other_functions.md) function). This functionality is not related to external dictionaries.

View File

@ -1,11 +1,15 @@
# Configuring an External Dictionary {#dicts-external_dicts_dict}
The dictionary configuration has the following structure:
If dictionary is configured using xml-file, than dictionary configuration has the following structure:
```xml
<dictionary>
<name>dict_name</name>
<structure>
<!-- Complex key configuration -->
</structure>
<source>
<!-- Source configuration -->
</source>
@ -14,16 +18,25 @@ The dictionary configuration has the following structure:
<!-- Memory layout configuration -->
</layout>
<structure>
<!-- Complex key configuration -->
</structure>
<lifetime>
<!-- Lifetime of dictionary in memory -->
</lifetime>
</dictionary>
```
Corresponding [DDL-query](../create.md#create-dictionary-query) has the following structure:
```sql
CREATE DICTIONARY dict_name
(
... -- attributes
)
PRIMARY KEY ... -- complex or single key configuration
SOURCE(...) -- Source configuration
LAYOUT(...) -- Memory layout configuration
LIFETIME(...) -- Lifetime of dictionary in memory
```
- name The identifier that can be used to access the dictionary. Use the characters `[a-zA-Z0-9_\-]`.
- [source](external_dicts_dict_sources.md) — Source of the dictionary.
- [layout](external_dicts_dict_layout.md) — Dictionary layout in memory.

View File

@ -34,6 +34,15 @@ The configuration looks like this:
</yandex>
```
in case of [DDL-query](../create.md#create-dictionary-query), equal configuration will looks like
```sql
CREATE DICTIONARY (...)
...
LAYOUT(LAYOUT_TYPE(param value)) -- layout settings
...
```
## Ways to Store Dictionaries in Memory
@ -64,6 +73,12 @@ Configuration example:
</layout>
```
or
```sql
LAYOUT(FLAT())
```
### hashed {#dicts-external_dicts_dict_layout-hashed}
The dictionary is completely stored in memory in the form of a hash table. The dictionary can contain any number of elements with any identifiers In practice, the number of keys can reach tens of millions of items.
@ -78,6 +93,12 @@ Configuration example:
</layout>
```
or
```sql
LAYOUT(HASHED())
```
### sparse_hashed {#dicts-external_dicts_dict_layout-sparse_hashed}
Similar to `hashed`, but uses less memory in favor more CPU usage.
@ -90,6 +111,9 @@ Configuration example:
</layout>
```
```sql
LAYOUT(SPARSE_HASHED())
```
### complex_key_hashed
@ -103,6 +127,9 @@ Configuration example:
</layout>
```
```sql
LAYOUT(COMPLEX_KEY_HASHED())
```
### range_hashed
@ -113,15 +140,15 @@ This storage method works the same way as hashed and allows using date/time (arb
Example: The table contains discounts for each advertiser in the format:
```text
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
| advertiser id | discount start date | discount end date | amount |
+===============+=====================+===================+========+
| 123 | 2015-01-01 | 2015-01-15 | 0.15 |
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
| 123 | 2015-01-16 | 2015-01-31 | 0.25 |
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
| 456 | 2015-01-01 | 2015-01-15 | 0.05 |
+---------------+---------------------+-------------------+--------+
+---------------|---------------------|-------------------|--------+
```
To use a sample for date ranges, define the `range_min` and `range_max` elements in the [structure](external_dicts_dict_structure.md). These elements must contain elements `name` and` type` (if `type` is not specified, the default type will be used - Date). `type` can be any numeric type (Date / DateTime / UInt64 / Int32 / others).
@ -144,6 +171,19 @@ Example:
...
```
or
```sql
CREATE DICTIONARY somedict (
id UInt64,
first Date,
last Date
)
PRIMARY KEY id
LAYOUT(RANGE_HASHED())
RANGE(MIN first MAX last)
```
To work with these dictionaries, you need to pass an additional argument to the `dictGetT` function, for which a range is selected:
```sql
@ -193,6 +233,18 @@ Configuration example:
</yandex>
```
or
```sql
CREATE DICTIONARY somedict(
Abcdef UInt64,
StartTimeStamp UInt64,
EndTimeStamp UInt64,
XXXType String DEFAULT ''
)
PRIMARY KEY Abcdef
RANGE(MIN StartTimeStamp MAX EndTimeStamp)
```
### cache
@ -218,6 +270,12 @@ Example of settings:
</layout>
```
or
```sql
LAYOUT(CACHE(SIZE_IN_CELLS 1000000000))
```
Set a large enough cache size. You need to experiment to select the number of cells:
1. Set some value.
@ -241,17 +299,17 @@ This type of storage is for mapping network prefixes (IP addresses) to metadata
Example: The table contains network prefixes and their corresponding AS number and country code:
```text
+-----------------+-------+--------+
+-----------------|-------|--------+
| prefix | asn | cca2 |
+=================+=======+========+
| 202.79.32.0/20 | 17501 | NP |
+-----------------+-------+--------+
+-----------------|-------|--------+
| 2620:0:870::/48 | 3856 | US |
+-----------------+-------+--------+
+-----------------|-------|--------+
| 2a02:6b8:1::/48 | 13238 | RU |
+-----------------+-------+--------+
+-----------------|-------|--------+
| 2001:db8::/32 | 65536 | ZZ |
+-----------------+-------+--------+
+-----------------|-------|--------+
```
When using this type of layout, the structure must have a composite key.
@ -279,6 +337,17 @@ Example:
...
```
or
```sql
CREATE DICTIONARY somedict (
prefix String,
asn UInt32,
cca2 String DEFAULT '??'
)
PRIMARY KEY prefix
```
The key must have only one String type attribute that contains an allowed IP prefix. Other types are not supported yet.
For queries, you must use the same functions (`dictGetT` with a tuple) as for dictionaries with composite keys:

View File

@ -15,7 +15,14 @@ Example of settings:
</dictionary>
```
Setting ` <lifetime> 0</lifetime> ` prevents updating dictionaries.
```sql
CREATE DICTIONARY (...)
...
LIFETIME(300)
...
```
Setting `<lifetime>0</lifetime>` (`LIFETIME(0)`) prevents dictionaries from updating.
You can set a time interval for upgrades, and ClickHouse will choose a uniformly random time within this range. This is necessary in order to distribute the load on the dictionary source when upgrading on a large number of servers.
@ -32,6 +39,12 @@ Example of settings:
</dictionary>
```
or
```sql
LIFETIME(MIN 300 MAX 360)
```
When upgrading the dictionaries, the ClickHouse server applies different logic depending on the type of [ source](external_dicts_dict_sources.md):
- For a text file, it checks the time of modification. If the time differs from the previously recorded time, the dictionary is updated.
@ -56,5 +69,13 @@ Example of settings:
</dictionary>
```
or
```sql
...
SOURCE(ODBC(... invalidate_query 'SELECT update_time FROM dictionary_source where id = 1'))
...
```
[Original article](https://clickhouse.yandex/docs/en/query_language/dicts/external_dicts_dict_lifetime/) <!--hide-->

View File

@ -2,7 +2,7 @@
An external dictionary can be connected from many different sources.
The configuration looks like this:
If dictionary is configured using xml-file, the configuration looks like this:
```xml
<yandex>
@ -19,6 +19,15 @@ The configuration looks like this:
</yandex>
```
In case of [DDL-query](../create.md#create-dictionary-query), equal configuration will looks like:
```sql
CREATE DICTIONARY dict_name (...)
...
SOURCE(SOURCE_TYPE(param1 val1 ... paramN valN)) -- Source configuration
...
```
The source is configured in the `source` section.
Types of sources (`source_type`):
@ -47,6 +56,12 @@ Example of settings:
</source>
```
or
```sql
SOURCE(FILE(path '/opt/dictionaries/os.tsv' format 'TabSeparated'))
```
Setting fields:
- `path` The absolute path to the file.
@ -68,6 +83,12 @@ Example of settings:
</source>
```
or
```sql
SOURCE(EXECUTABLE(command 'cat /opt/dictionaries/os.tsv' format 'TabSeparated'))
```
Setting fields:
- `command` The absolute path to the executable file, or the file name (if the program directory is written to `PATH`).
@ -99,6 +120,17 @@ Example of settings:
</source>
```
or
```sql
SOURCE(HTTP(
url 'http://[::1]/os.tsv'
format 'TabSeparated'
credentials(user 'user' password 'password')
headers(header(name 'API-KEY' value 'key'))
))
```
In order for ClickHouse to access an HTTPS resource, you must [configure openSSL](../../operations/server_settings/settings.md#server_settings-openssl) in the server configuration.
Setting fields:
@ -121,12 +153,25 @@ You can use this method to connect any database that has an ODBC driver.
Example of settings:
```xml
<odbc>
<db>DatabaseName</db>
<table>ShemaName.TableName</table>
<connection_string>DSN=some_parameters</connection_string>
<invalidate_query>SQL_QUERY</invalidate_query>
</odbc>
<source>
<odbc>
<db>DatabaseName</db>
<table>ShemaName.TableName</table>
<connection_string>DSN=some_parameters</connection_string>
<invalidate_query>SQL_QUERY</invalidate_query>
</odbc>
</source>
```
or
```sql
SOURCE(ODBC(
db 'DatabaseName'
table 'SchemaName.TableName'
connection_string 'DSN=some_parameters'
invalidate_query 'SQL_QUERY'
))
```
Setting fields:
@ -233,6 +278,19 @@ The dictionary configuration in ClickHouse:
</yandex>
```
or
```sql
CREATE DICTIONARY table_name (
id UInt64,
some_column UInt64 DEFAULT 0
)
PRIMARY KEY id
SOURCE(ODBC(connection_string 'DSN=myconnection' table 'postgresql_table'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 360)
```
You may need to edit `odbc.ini` to specify the full path to the library with the driver `DRIVER=/usr/local/lib/psqlodbcw.so`.
### Example of Connecting MS SQL Server
@ -316,6 +374,19 @@ Configuring the dictionary in ClickHouse:
</yandex>
```
or
```sql
CREATE DICTIONARY test (
k UInt64,
s String DEFAULT ''
)
PRIMARY KEY k
SOURCE(ODBC(table 'dict' connection_string 'DSN=MSSQL;UID=test;PWD=test'))
LAYOUT(FLAT())
LIFETIME(MIN 300 MAX 360)
```
## DBMS
@ -345,6 +416,22 @@ Example of settings:
</source>
```
or
```sql
SOURCE(MYSQL(
port 3306
user 'clickhouse'
password 'qwerty'
replica(host 'example01-1' priority 1)
replica(host 'example01-2' priority 1)
db 'db_name'
table 'table_name'
where 'id=10'
invalidate_query 'SQL_QUERY'
))
```
Setting fields:
- `port` The port on the MySQL server. You can specify it for all replicas, or for each one individually (inside `<replica>`).
@ -385,6 +472,21 @@ Example of settings:
</source>
```
or
```sql
SOURCE(MYSQL(
host 'localhost'
socket '/path/to/socket/file.sock'
user 'clickhouse'
password 'qwerty'
db 'db_name'
table 'table_name'
where 'id=10'
invalidate_query 'SQL_QUERY'
))
```
### ClickHouse {#dicts-external_dicts_dict_sources-clickhouse}
@ -404,6 +506,20 @@ Example of settings:
</source>
```
or
```sql
SOURCE(CLICKHOUSE(
host 'example01-01-1'
port 9000
user 'default'
password ''
db 'default'
table 'ids'
where 'id=10'
))
```
Setting fields:
- `host` The ClickHouse host. If it is a local host, the query is processed without any network activity. To improve fault tolerance, you can create a [Distributed](../../operations/table_engines/distributed.md) table and enter it in subsequent configurations.
@ -433,6 +549,19 @@ Example of settings:
</source>
```
or
```sql
SOURCE(MONGO(
host 'localhost'
port 27017
user ''
password ''
db 'test'
collection 'dictionary_source'
))
```
Setting fields:
- `host` The MongoDB host.
@ -458,6 +587,17 @@ Example of settings:
</source>
```
or
```sql
SOURCE(REDIS(
host 'localhost'
port 6379
storage_type 'simple'
db_index 0
))
```
Setting fields:
- `host` The Redis host.

View File

@ -22,23 +22,35 @@ Overall structure:
</dictionary>
```
Columns are described in the structure:
or
```sql
CREATE DICTIONARY (
Id UInt64,
-- attributes
)
PRIMARY KEY Id
...
```
In xml-file attributes are described in the structure section:
- `<id>` — [Key column](external_dicts_dict_structure.md#ext_dict_structure-key).
- `<attribute>` — [Data column](external_dicts_dict_structure.md#ext_dict_structure-attributes). There can be a large number of columns.
- `<attribute>` — [Data column](external_dicts_dict_structure.md#ext_dict_structure-attributes). There can be a large number of attributes.
In DDL-query attributes are described the body of `CREATE` query:
- `PRIMARY KEY` — [Key column](external_dicts_dict_structure.md#ext_dict_structure-key)
- `AttrName AttrType` — [Data column](external_dicts_dict_structure.md#ext_dict_structure-attributes)
## Key {#ext_dict_structure-key}
ClickHouse supports the following types of keys:
- Numeric key. UInt64. Defined in the tag `<id>` .
- Composite key. Set of values of different types. Defined in the tag `<key>` .
- Numeric key. UInt64. Defined in the `<id>` tag or using `PRIMARY KEY` keyword.
- Composite key. Set of values of different types. Defined in the tag `<key>` or `PRIMARY KEY` keyword.
A structure can contain either `<id>` or `<key>` .
!!! warning
The key doesn't need to be defined separately in attributes.
A xml-structure can contain either `<id>` or `<key>`. DDL-query must contain single `PRIMARY KEY`.
### Numeric Key
@ -56,6 +68,20 @@ Configuration fields:
- `name` The name of the column with keys.
For DDL-query:
```sql
CREATE DICTIONARY (
Id UInt64,
...
)
PRIMARY KEY Id
...
```
- `PRIMARY KEY` The name of the column with keys.
### Composite Key
The key can be a `tuple` from any types of fields. The [layout](external_dicts_dict_layout.md) in this case must be `complex_key_hashed` or `complex_key_cache`.
@ -81,6 +107,18 @@ The key structure is set in the element `<key>`. Key fields are specified in the
...
```
or
```sql
CREATE DICTIONARY (
field1 String,
field2 String
...
)
PRIMARY KEY field1, field2
...
```
For a query to the `dictGet*` function, a tuple is passed as the key. Example: `dictGetString('dict_name', 'attr_name', tuple('string for field1', num_for_field2))`.
@ -103,6 +141,14 @@ Configuration example:
</structure>
```
or
```sql
CREATE DICTIONARY somename (
Name ClickHouseDataType DEFAULT '' EXPRESSION rand64() HIERARCHICAL INJECTIVE IS_OBJECT_ID
)
```
Configuration fields:
Tag | Description | Required

View File

@ -101,10 +101,17 @@ DROP [TEMPORARY] TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster]
Deletes the table.
If `IF EXISTS` is specified, it doesn't return an error if the table doesn't exist or the database doesn't exist.
```
DROP DICTIONARY [IF EXISTS] [db.]name
```
Delets the dictionary.
If `IF EXISTS` is specified, it doesn't return an error if the table doesn't exist or the database doesn't exist.
## EXISTS
```sql
EXISTS [TEMPORARY] TABLE [db.]name [INTO OUTFILE filename] [FORMAT format]
EXISTS [TEMPORARY] [TABLE|DICTIONARY] [db.]name [INTO OUTFILE filename] [FORMAT format]
```
Returns a single `UInt8`-type column, which contains the single value `0` if the table or database doesn't exist, or `1` if the table exists in the specified database.

View File

@ -3,10 +3,10 @@
## SHOW CREATE TABLE
```sql
SHOW CREATE [TEMPORARY] TABLE [db.]table [INTO OUTFILE filename] [FORMAT format]
SHOW CREATE [TEMPORARY] [TABLE|DICTIONARY] [db.]table [INTO OUTFILE filename] [FORMAT format]
```
Returns a single `String`-type 'statement' column, which contains a single value the `CREATE` query used for creating the specified table.
Returns a single `String`-type 'statement' column, which contains a single value the `CREATE` query used for creating the specified object.
## SHOW DATABASES {#show-databases}
@ -62,3 +62,33 @@ SHOW TABLES FROM system LIKE '%co%' LIMIT 2
│ collations │
└────────────────────────────────┘
```
## SHOW DICTIONARIES
Displays a list of [external dictionaries](dicts/external_dicts.md).
```sql
SHOW DICTIONARIES [FROM <db>] [LIKE '<pattern>'] [LIMIT <N>] [INTO OUTFILE <filename>] [FORMAT <format>]
```
If the `FROM` clause is not specified, the query returns the list of dictionaries from the current database.
You can get the same results as the `SHOW DICTIONARIES` query in the following way:
```sql
SELECT name FROM system.dictionaries WHERE database = <db> [AND name LIKE <pattern>] [LIMIT <N>] [INTO OUTFILE <filename>] [FORMAT <format>]
```
**Example**
The following query selects the first two rows from the list of tables in the `system` database, whose names contain `co`.
```sql
SHOW DICTIONARIES FROM db LIKE '%reg%' LIMIT 2
```
```text
┌─name─────────┐
│ regions │
│ region_names │
└──────────────┘
```

View File

@ -248,3 +248,19 @@ SELECT a, b, c FROM (SELECT ...)
没有单独的删除视图的语法。如果要删除视图,请使用`DROP TABLE`。
[来源文章](https://clickhouse.yandex/docs/en/query_language/create/) <!--hide-->
## CREATE DICTIONARY {#create-dictionary-query}
```sql
CREATE DICTIONARY [IF NOT EXISTS] [db.]dictionary_name
(
key1 type1 [DEFAULT|EXPRESSION expr1] [HIERARCHICAL|INJECTIVE|IS_OBJECT_ID],
key2 type2 [DEFAULT|EXPRESSION expr2] [HIERARCHICAL|INJECTIVE|IS_OBJECT_ID],
attr1 type2 [DEFAULT|EXPRESSION expr3],
attr2 type2 [DEFAULT|EXPRESSION expr4]
)
PRIMARY KEY key1, key2
SOURCE(SOURCE_NAME([param1 value1 ... paramN valueN]))
LAYOUT(LAYOUT_NAME([param_name param_value]))
LIFETIME([MIN val1] MAX val2)
```