mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-19 12:52:37 +00:00
enhance log engine docs
This commit is contained in:
parent
da41c02749
commit
fc91af1292
@ -16,6 +16,10 @@ Engines of the family:
|
||||
|
||||
`Log` family table engines can store data to [HDFS](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-hdfs) or [S3](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-s3) distributed file systems.
|
||||
|
||||
:::important
|
||||
Despite the name, *Log table engines are not meant for the storage of log data. They should only be used for small volumes which need to be written quickly.
|
||||
:::
|
||||
|
||||
## Common Properties {#common-properties}
|
||||
|
||||
Engines:
|
||||
|
@ -11,3 +11,88 @@ The engine belongs to the family of `Log` engines. See the common properties of
|
||||
`Log` differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads.
|
||||
For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other.
|
||||
The `Log` engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The `Log` engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes.
|
||||
|
||||
## Creating a Table {#table_engines-log-creating-a-table}
|
||||
|
||||
``` sql
|
||||
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
|
||||
(
|
||||
column1_name [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
|
||||
column2_name [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
|
||||
...
|
||||
) ENGINE = Log
|
||||
```
|
||||
|
||||
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
|
||||
|
||||
## Writing the Data {#table_engines-log-writing-the-data}
|
||||
|
||||
The `Log` engine efficiently stores data by writing each column to its own file. For every table, the Log engine writes the following files to the specified storage path:
|
||||
|
||||
- `<column>.bin`: A data file for each column, containing the serialized and compressed data.
|
||||
`__marks.mrk`: A marks file, storing offsets and row counts for each data block inserted. Marks are used to facilitate efficient query execution by allowing the engine to skip irrelevant data blocks during reads.
|
||||
|
||||
### Writing Process
|
||||
|
||||
When data is written to a `Log` table:
|
||||
|
||||
1. Data is serialized and compressed into blocks.
|
||||
2. For each column, the compressed data is appended to its respective `<column>.bin` file.
|
||||
3. Corresponding entries are added to the `__marks.mrk` file to record the offset and row count of the newly inserted data.
|
||||
|
||||
## Reading the Data {#table_engines-log-reading-the-data}
|
||||
|
||||
The file with marks allows ClickHouse to parallelize the reading of data. This means that a `SELECT` query returns rows in an unpredictable order. Use the `ORDER BY` clause to sort rows.
|
||||
|
||||
## Example of Use {#table_engines-log-example-of-use}
|
||||
|
||||
Creating a table:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE log_table
|
||||
(
|
||||
timestamp DateTime,
|
||||
message_type String,
|
||||
message String
|
||||
)
|
||||
ENGINE = Log
|
||||
```
|
||||
|
||||
Inserting data:
|
||||
|
||||
``` sql
|
||||
INSERT INTO log_table VALUES (now(),'REGULAR','The first regular message')
|
||||
INSERT INTO log_table VALUES (now(),'REGULAR','The second regular message'),(now(),'WARNING','The first warning message')
|
||||
```
|
||||
|
||||
We used two `INSERT` queries to create two data blocks inside the `<column>.bin` files.
|
||||
|
||||
ClickHouse uses multiple threads when selecting data. Each thread reads a separate data block and returns resulting rows independently as it finishes. As a result, the order of blocks of rows in the output may not match the order of the same blocks in the input. For example:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM log_table
|
||||
```
|
||||
|
||||
``` text
|
||||
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
|
||||
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
|
||||
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
|
||||
└─────────────────────┴──────────────┴────────────────────────────┘
|
||||
┌───────────timestamp─┬─message_type─┬─message───────────────────┐
|
||||
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
|
||||
└─────────────────────┴──────────────┴───────────────────────────┘
|
||||
```
|
||||
|
||||
Sorting the results (ascending order by default):
|
||||
|
||||
``` sql
|
||||
SELECT * FROM log_table ORDER BY timestamp
|
||||
```
|
||||
|
||||
``` text
|
||||
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
|
||||
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
|
||||
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
|
||||
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
|
||||
└─────────────────────┴──────────────┴────────────────────────────┘
|
||||
```
|
||||
|
@ -4,11 +4,11 @@ toc_priority: 32
|
||||
toc_title: StripeLog
|
||||
---
|
||||
|
||||
# Stripelog
|
||||
# StripeLog
|
||||
|
||||
This engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article.
|
||||
|
||||
Use this engine in scenarios when you need to write many tables with a small amount of data (less than 1 million rows).
|
||||
Use this engine in scenarios when you need to write many tables with a small amount of data (less than 1 million rows). For example, this table can be used to store incoming data batches for transformation where atomic processing of them is required. 100k instances of this table type are viable for a ClickHouse server. This table engine should be prefered over [Log](./log.md) when a high number of tables are required. This is at the expense of read efficiency.
|
||||
|
||||
## Creating a Table {#table_engines-stripelog-creating-a-table}
|
||||
|
||||
|
@ -11,3 +11,71 @@ The engine belongs to the log engine family. See [Log Engine Family](../../../en
|
||||
This table engine is typically used with the write-once method: write data one time, then read it as many times as necessary. For example, you can use `TinyLog`-type tables for intermediary data that is processed in small batches. Note that storing data in a large number of small tables is inefficient.
|
||||
|
||||
Queries are executed in a single stream. In other words, this engine is intended for relatively small tables (up to about 1,000,000 rows). It makes sense to use this table engine if you have many small tables, since it’s simpler than the [Log](../../../engines/table-engines/log-family/log.md) engine (fewer files need to be opened).
|
||||
|
||||
## Characteristics
|
||||
|
||||
- **Simpler Structure**: Unlike the Log engine, TinyLog does not use mark files. This reduces complexity but also limits performance optimizations for large datasets.
|
||||
- **Single Stream Queries**: Queries on TinyLog tables are executed in a single stream, making it suitable for relatively small tables, typically up to 1,000,000 rows.
|
||||
- **Efficient for Small Table**: The simplicity of the TinyLog engine makes it advantageous when managing many small tables, as it requires fewer file operations compared to the Log engine.
|
||||
|
||||
Unlike the Log engine, TinyLog does not use mark files. This reduces complexity but also limits performance optimizations for larger datasets.
|
||||
|
||||
## Creating a Table {#table_engines-tinylog-creating-a-table}
|
||||
|
||||
``` sql
|
||||
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
|
||||
(
|
||||
column1_name [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
|
||||
column2_name [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
|
||||
...
|
||||
) ENGINE = TinyLog
|
||||
```
|
||||
|
||||
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
|
||||
|
||||
## Writing the Data {#table_engines-tinylog-writing-the-data}
|
||||
|
||||
The `TinyLog` engine stores all the columns in one file. For each `INSERT` query, ClickHouse appends the data block to the end of a table file, writing columns one by one.
|
||||
|
||||
For each table ClickHouse writes the files:
|
||||
|
||||
- `<column>.bin`: A data file for each column, containing the serialized and compressed data.
|
||||
|
||||
The `TinyLog` engine does not support the `ALTER UPDATE` and `ALTER DELETE` operations.
|
||||
|
||||
## Example of Use {#table_engines-tinylog-example-of-use}
|
||||
|
||||
Creating a table:
|
||||
|
||||
``` sql
|
||||
CREATE TABLE tiny_log_table
|
||||
(
|
||||
timestamp DateTime,
|
||||
message_type String,
|
||||
message String
|
||||
)
|
||||
ENGINE = TinyLog
|
||||
```
|
||||
|
||||
Inserting data:
|
||||
|
||||
``` sql
|
||||
INSERT INTO tiny_log_table VALUES (now(),'REGULAR','The first regular message')
|
||||
INSERT INTO tiny_log_table VALUES (now(),'REGULAR','The second regular message'),(now(),'WARNING','The first warning message')
|
||||
```
|
||||
|
||||
We used two `INSERT` queries to create two data blocks inside the `<column>.bin` files.
|
||||
|
||||
ClickHouse uses a single stream selecting data. As a result, the order of blocks of rows in the output matches the order of the same blocks in the input. For example:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM tiny_log_table
|
||||
```
|
||||
|
||||
``` text
|
||||
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
|
||||
│ 2024-12-10 13:11:58 │ REGULAR │ The first regular message │
|
||||
│ 2024-12-10 13:12:12 │ REGULAR │ The second regular message │
|
||||
│ 2024-12-10 13:12:12 │ WARNING │ The first warning message │
|
||||
└─────────────────────┴──────────────┴────────────────────────────┘
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user