2021-01-25 14:13:29 +00:00
---
2022-08-28 14:53:34 +00:00
slug: /en/engines/table-engines/integrations/embedded-rocksdb
2023-06-23 13:16:22 +00:00
sidebar_position: 50
2022-04-09 13:29:05 +00:00
sidebar_label: EmbeddedRocksDB
2020-10-01 10:59:51 +00:00
---
2024-11-05 17:41:01 +00:00
import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';
2022-06-02 10:55:18 +00:00
# EmbeddedRocksDB Engine
2020-10-01 10:59:51 +00:00
2024-11-05 17:41:01 +00:00
< CloudNotSupportedBadge / >
This engine allows integrating ClickHouse with [RocksDB ](http://rocksdb.org/ ).
2020-10-01 10:59:51 +00:00
2023-09-18 18:29:13 +00:00
## Creating a Table {#creating-a-table}
2020-10-01 10:59:51 +00:00
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
2022-08-23 12:58:57 +00:00
) ENGINE = EmbeddedRocksDB([ttl, rocksdb_dir, read_only]) PRIMARY KEY(primary_key_name)
2024-05-03 10:29:36 +00:00
[ SETTINGS name=value, ... ]
2020-10-01 10:59:51 +00:00
```
2022-08-09 13:48:49 +00:00
Engine parameters:
2020-10-01 10:59:51 +00:00
2022-08-10 15:10:52 +00:00
- `ttl` - time to live for values. TTL is accepted in seconds. If TTL is 0, regular RocksDB instance is used (without TTL).
2022-08-26 12:00:17 +00:00
- `rocksdb_dir` - path to the directory of an existed RocksDB or the destination path of the created RocksDB. Open the table with the specified `rocksdb_dir` .
2022-08-23 12:58:57 +00:00
- `read_only` - when `read_only` is set to true, read-only mode is used. For storage with TTL, compaction will not be triggered (neither manual nor automatic), so no expired entries are removed.
2021-07-26 18:17:39 +00:00
- `primary_key_name` – any column name in the column list.
2021-01-22 16:40:58 +00:00
- `primary key` must be specified, it supports only one column in the primary key. The primary key will be serialized in binary as a `rocksdb key` .
- columns other than the primary key will be serialized in binary as `rocksdb` value in corresponding order.
- queries with key `equals` or `in` filtering will be optimized to multi keys lookup from `rocksdb` .
2020-10-01 10:59:51 +00:00
2024-05-03 10:29:36 +00:00
Engine settings:
- `optimize_for_bulk_insert` – Table is optimized for bulk insertions (insert pipeline will create SST files and import to rocksdb database instead of writing to memtables); default value: `1` .
- `bulk_insert_block_size` - Minimum size of SST files (in term of rows) created by bulk insertion; default value: `1048449` .
2020-10-01 10:59:51 +00:00
Example:
``` sql
CREATE TABLE test
(
`key` String,
`v1` UInt32,
`v2` String,
2021-12-01 20:12:10 +00:00
`v3` Float32
2020-10-01 10:59:51 +00:00
)
2020-11-08 15:41:16 +00:00
ENGINE = EmbeddedRocksDB
2020-10-01 10:59:51 +00:00
PRIMARY KEY key
2021-01-26 19:24:06 +00:00
```
2021-07-26 18:17:39 +00:00
## Metrics
There is also `system.rocksdb` table, that expose rocksdb statistics:
```sql
SELECT
name,
value
FROM system.rocksdb
┌─name──────────────────────┬─value─┐
│ no.file.opens │ 1 │
│ number.block.decompressed │ 1 │
└───────────────────────────┴───────┘
```
2021-07-26 18:17:39 +00:00
## Configuration
You can also change any [rocksdb options ](https://github.com/facebook/rocksdb/wiki/Option-String-and-Option-Map ) using config:
```xml
< rocksdb >
< options >
< max_background_jobs > 8< / max_background_jobs >
< / options >
< column_family_options >
< num_levels > 2< / num_levels >
< / column_family_options >
< tables >
< table >
< name > TABLE< / name >
< options >
< max_background_jobs > 8< / max_background_jobs >
< / options >
< column_family_options >
< num_levels > 2< / num_levels >
< / column_family_options >
< / table >
< / tables >
< / rocksdb >
```
2023-02-15 14:29:11 +00:00
2023-11-20 16:55:42 +00:00
By default trivial approximate count optimization is turned off, which might affect the performance `count()` queries. To enable this
optimization set up `optimize_trivial_approximate_count_query = 1` . Also, this setting affects `system.tables` for EmbeddedRocksDB engine,
turn on the settings to see approximate values for `total_rows` and `total_bytes` .
2023-09-18 18:29:13 +00:00
## Supported operations {#supported-operations}
2023-02-15 14:29:11 +00:00
### Inserts
2023-02-15 18:29:26 +00:00
When new rows are inserted into `EmbeddedRocksDB` , if the key already exists, the value will be updated, otherwise a new key is created.
2023-02-15 14:29:11 +00:00
Example:
```sql
INSERT INTO test VALUES ('some key', 1, 'value', 3.2);
```
### Deletes
2023-06-23 13:16:22 +00:00
Rows can be deleted using `DELETE` query or `TRUNCATE` .
2023-02-15 14:29:11 +00:00
```sql
DELETE FROM test WHERE key LIKE 'some%' AND v1 > 1;
```
```sql
ALTER TABLE test DELETE WHERE key LIKE 'some%' AND v1 > 1;
```
```sql
TRUNCATE TABLE test;
```
### Updates
2023-02-15 18:29:33 +00:00
Values can be updated using the `ALTER TABLE` query. The primary key cannot be updated.
2023-02-15 14:29:11 +00:00
```sql
ALTER TABLE test UPDATE v1 = v1 * 10 + 2 WHERE key LIKE 'some%' AND v3 > 3.1;
```
2023-06-02 14:08:48 +00:00
### Joins
A special `direct` join with EmbeddedRocksDB tables is supported.
This direct join avoids forming a hash table in memory and accesses
the data directly from the EmbeddedRocksDB.
2023-06-02 20:30:18 +00:00
With large joins you may see much lower memory usage with direct joins
because the hash table is not created.
2023-06-02 14:08:48 +00:00
To enable direct joins:
```sql
2023-06-02 20:30:18 +00:00
SET join_algorithm = 'direct, hash'
2023-06-02 14:08:48 +00:00
```
:::tip
2023-06-02 20:30:18 +00:00
When the `join_algorithm` is set to `direct, hash` , direct joins will be used
when possible, and hash otherwise.
2023-06-02 14:08:48 +00:00
:::
#### Example
##### Create and populate an EmbeddedRocksDB table:
```sql
CREATE TABLE rdb
(
`key` UInt32,
`value` Array(UInt32),
`value2` String
)
ENGINE = EmbeddedRocksDB
PRIMARY KEY key
```
```sql
INSERT INTO rdb
SELECT
toUInt32(sipHash64(number) % 10) as key,
[key, key+1] as value,
('val2' || toString(key)) as value2
FROM numbers_mt(10);
```
##### Create and populate a table to join with table `rdb`:
```sql
CREATE TABLE t2
(
`k` UInt16
)
ENGINE = TinyLog
```
```sql
INSERT INTO t2 SELECT number AS k
FROM numbers_mt(10)
```
##### Set the join algorithm to `direct`:
```sql
SET join_algorithm = 'direct'
```
##### An INNER JOIN:
```sql
SELECT *
FROM
(
SELECT k AS key
FROM t2
) AS t2
INNER JOIN rdb ON rdb.key = t2.key
ORDER BY key ASC
```
```response
┌─key─┬─rdb.key─┬─value──┬─value2─┐
│ 0 │ 0 │ [0,1] │ val20 │
│ 2 │ 2 │ [2,3] │ val22 │
│ 3 │ 3 │ [3,4] │ val23 │
│ 6 │ 6 │ [6,7] │ val26 │
│ 7 │ 7 │ [7,8] │ val27 │
│ 8 │ 8 │ [8,9] │ val28 │
│ 9 │ 9 │ [9,10] │ val29 │
└─────┴─────────┴────────┴────────┘
```
2023-06-02 20:30:18 +00:00
### More information on Joins
2023-12-20 18:26:36 +00:00
- [`join_algorithm` setting ](/docs/en/operations/settings/settings.md#join_algorithm )
2023-06-02 20:30:18 +00:00
- [JOIN clause ](/docs/en/sql-reference/statements/select/join.md )