mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-15 12:14:18 +00:00
Remove documentation from MaterializedMySQL
This commit is contained in:
parent
04d84d5cff
commit
d05a77ecab
@ -7,315 +7,5 @@ sidebar_position: 70
|
||||
# [experimental] MaterializedMySQL
|
||||
|
||||
:::note
|
||||
This database engine is experimental. To use it, set `allow_experimental_database_materialized_mysql` to 1 in your configuration files or by using the `SET` command:
|
||||
```sql
|
||||
SET allow_experimental_database_materialized_mysql=1
|
||||
```
|
||||
This database engine is obsolete and cannot be used.
|
||||
:::
|
||||
|
||||
Creates a ClickHouse database with all the tables existing in MySQL, and all the data in those tables. The ClickHouse server works as MySQL replica. It reads `binlog` and performs DDL and DML queries.
|
||||
|
||||
## Creating a Database {#creating-a-database}
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
|
||||
ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
|
||||
[TABLE OVERRIDE table1 (...), TABLE OVERRIDE table2 (...)]
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
|
||||
- `host:port` — MySQL server endpoint.
|
||||
- `database` — MySQL database name.
|
||||
- `user` — MySQL user.
|
||||
- `password` — User password.
|
||||
|
||||
## Engine Settings
|
||||
|
||||
### max_rows_in_buffer
|
||||
|
||||
`max_rows_in_buffer` — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`.
|
||||
|
||||
### max_bytes_in_buffer
|
||||
|
||||
`max_bytes_in_buffer` — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`.
|
||||
|
||||
### max_flush_data_time
|
||||
|
||||
`max_flush_data_time` — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: `1000`.
|
||||
|
||||
### max_wait_time_when_mysql_unavailable
|
||||
|
||||
`max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disables retry. Default: `1000`.
|
||||
|
||||
### allows_query_when_mysql_lost
|
||||
`allows_query_when_mysql_lost` — Allows to query a materialized table when MySQL is lost. Default: `0` (`false`).
|
||||
|
||||
### allow_startup_database_without_connection_to_mysql
|
||||
`allow_startup_database_without_connection_to_mysql` — Allow to create and attach database without available connection to MySQL. Default: `0` (`false`).
|
||||
|
||||
### materialized_mysql_tables_list
|
||||
|
||||
`materialized_mysql_tables_list` — a comma-separated list of mysql database tables, which will be replicated by MaterializedMySQL database engine. Default value: empty list — means whole tables will be replicated.
|
||||
|
||||
```sql
|
||||
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***')
|
||||
SETTINGS
|
||||
allows_query_when_mysql_lost=true,
|
||||
max_wait_time_when_mysql_unavailable=10000;
|
||||
```
|
||||
|
||||
## Settings on MySQL-server Side
|
||||
|
||||
For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-side configuration settings that must be set:
|
||||
|
||||
### default_authentication_plugin
|
||||
|
||||
`default_authentication_plugin = mysql_native_password` since `MaterializedMySQL` can only authorize with this method.
|
||||
|
||||
### gtid_mode
|
||||
|
||||
`gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication.
|
||||
|
||||
:::note
|
||||
While turning on `gtid_mode` you should also specify `enforce_gtid_consistency = on`.
|
||||
:::
|
||||
|
||||
## Virtual Columns {#virtual-columns}
|
||||
|
||||
When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](/docs/en/engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns.
|
||||
|
||||
### \_version
|
||||
|
||||
`_version` — Transaction counter. Type [UInt64](/docs/en/sql-reference/data-types/int-uint.md).
|
||||
|
||||
### \_sign
|
||||
|
||||
`_sign` — Deletion mark. Type [Int8](/docs/en/sql-reference/data-types/int-uint.md). Possible values:
|
||||
- `1` — Row is not deleted,
|
||||
- `-1` — Row is deleted.
|
||||
|
||||
## Data Types Support {#data_types-support}
|
||||
|
||||
| MySQL | ClickHouse |
|
||||
|-------------------------|--------------------------------------------------------------|
|
||||
| TINY | [Int8](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| SHORT | [Int16](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| INT24 | [Int32](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| LONG | [UInt32](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| LONGLONG | [UInt64](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| FLOAT | [Float32](/docs/en/sql-reference/data-types/float.md) |
|
||||
| DOUBLE | [Float64](/docs/en/sql-reference/data-types/float.md) |
|
||||
| DECIMAL, NEWDECIMAL | [Decimal](/docs/en/sql-reference/data-types/decimal.md) |
|
||||
| DATE, NEWDATE | [Date](/docs/en/sql-reference/data-types/date.md) |
|
||||
| DATETIME, TIMESTAMP | [DateTime](/docs/en/sql-reference/data-types/datetime.md) |
|
||||
| DATETIME2, TIMESTAMP2 | [DateTime64](/docs/en/sql-reference/data-types/datetime64.md) |
|
||||
| YEAR | [UInt16](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| TIME | [Int64](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| ENUM | [Enum](/docs/en/sql-reference/data-types/enum.md) |
|
||||
| STRING | [String](/docs/en/sql-reference/data-types/string.md) |
|
||||
| VARCHAR, VAR_STRING | [String](/docs/en/sql-reference/data-types/string.md) |
|
||||
| BLOB | [String](/docs/en/sql-reference/data-types/string.md) |
|
||||
| GEOMETRY | [String](/docs/en/sql-reference/data-types/string.md) |
|
||||
| BINARY | [FixedString](/docs/en/sql-reference/data-types/fixedstring.md) |
|
||||
| BIT | [UInt64](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
| SET | [UInt64](/docs/en/sql-reference/data-types/int-uint.md) |
|
||||
|
||||
[Nullable](/docs/en/sql-reference/data-types/nullable.md) is supported.
|
||||
|
||||
The data of TIME type in MySQL is converted to microseconds in ClickHouse.
|
||||
|
||||
Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws an exception and stops replication.
|
||||
|
||||
## Specifics and Recommendations {#specifics-and-recommendations}
|
||||
|
||||
### Compatibility Restrictions {#compatibility-restrictions}
|
||||
|
||||
Apart of the data types limitations there are few restrictions comparing to `MySQL` databases, that should be resolved before replication will be possible:
|
||||
|
||||
- Each table in `MySQL` should contain `PRIMARY KEY`.
|
||||
|
||||
- Replication for tables, those are containing rows with `ENUM` field values out of range (specified in `ENUM` signature) will not work.
|
||||
|
||||
### DDL Queries {#ddl-queries}
|
||||
|
||||
MySQL DDL queries are converted into the corresponding ClickHouse DDL queries ([ALTER](/docs/en/sql-reference/statements/alter/index.md), [CREATE](/docs/en/sql-reference/statements/create/index.md), [DROP](/docs/en/sql-reference/statements/drop.md), [RENAME](/docs/en/sql-reference/statements/rename.md)). If ClickHouse cannot parse some DDL query, the query is ignored.
|
||||
|
||||
### Data Replication {#data-replication}
|
||||
|
||||
`MaterializedMySQL` does not support direct `INSERT`, `DELETE` and `UPDATE` queries. However, they are supported in terms of data replication:
|
||||
|
||||
- MySQL `INSERT` query is converted into `INSERT` with `_sign=1`.
|
||||
|
||||
- MySQL `DELETE` query is converted into `INSERT` with `_sign=-1`.
|
||||
|
||||
- MySQL `UPDATE` query is converted into `INSERT` with `_sign=-1` and `INSERT` with `_sign=1` if the primary key has been changed, or
|
||||
`INSERT` with `_sign=1` if not.
|
||||
|
||||
### Selecting from MaterializedMySQL Tables {#select}
|
||||
|
||||
`SELECT` query from `MaterializedMySQL` tables has some specifics:
|
||||
|
||||
- If `_version` is not specified in the `SELECT` query, the
|
||||
[FINAL](/docs/en/sql-reference/statements/select/from.md/#select-from-final) modifier is used, so only rows with
|
||||
`MAX(_version)` are returned for each primary key value.
|
||||
|
||||
- If `_sign` is not specified in the `SELECT` query, `WHERE _sign=1` is used by default. So the deleted rows are not
|
||||
included into the result set.
|
||||
|
||||
- The result includes columns comments in case they exist in MySQL database tables.
|
||||
|
||||
### Index Conversion {#index-conversion}
|
||||
|
||||
MySQL `PRIMARY KEY` and `INDEX` clauses are converted into `ORDER BY` tuples in ClickHouse tables.
|
||||
|
||||
ClickHouse has only one physical order, which is determined by `ORDER BY` clause. To create a new physical order, use
|
||||
[materialized views](/docs/en/sql-reference/statements/create/view.md/#materialized).
|
||||
|
||||
**Notes**
|
||||
|
||||
- Rows with `_sign=-1` are not deleted physically from the tables.
|
||||
- Cascade `UPDATE/DELETE` queries are not supported by the `MaterializedMySQL` engine, as they are not visible in the
|
||||
MySQL binlog.
|
||||
- Replication can be easily broken.
|
||||
- Manual operations on database and tables are forbidden.
|
||||
- `MaterializedMySQL` is affected by the [optimize_on_insert](/docs/en/operations/settings/settings.md/#optimize-on-insert)
|
||||
setting. Data is merged in the corresponding table in the `MaterializedMySQL` database when a table in the MySQL
|
||||
server changes.
|
||||
|
||||
### Table Overrides {#table-overrides}
|
||||
|
||||
Table overrides can be used to customize the ClickHouse DDL queries, allowing you to make schema optimizations for your
|
||||
application. This is especially useful for controlling partitioning, which is important for the overall performance of
|
||||
MaterializedMySQL.
|
||||
|
||||
These are the schema conversion manipulations you can do with table overrides for MaterializedMySQL:
|
||||
|
||||
* Modify column type. Must be compatible with the original type, or replication will fail. For example,
|
||||
you can modify a UInt32 column to UInt64, but you can not modify a String column to Array(String).
|
||||
* Modify [column TTL](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#mergetree-column-ttl).
|
||||
* Modify [column compression codec](/docs/en/sql-reference/statements/create/table.md/#codecs).
|
||||
* Add [ALIAS columns](/docs/en/sql-reference/statements/create/table.md/#alias).
|
||||
* Add [skipping indexes](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-data_skipping-indexes). Note that you need to enable `use_skip_indexes_if_final` setting to make them work (MaterializedMySQL is using `SELECT ... FINAL` by default)
|
||||
* Add [projections](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#projections). Note that projection optimizations are
|
||||
disabled when using `SELECT ... FINAL` (which MaterializedMySQL does by default), so their utility is limited here.
|
||||
`INDEX ... TYPE hypothesis` as [described in the v21.12 blog post]](https://clickhouse.com/blog/en/2021/clickhouse-v21.12-released/)
|
||||
may be more useful in this case.
|
||||
* Modify [PARTITION BY](/docs/en/engines/table-engines/mergetree-family/custom-partitioning-key/)
|
||||
* Modify [ORDER BY](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#mergetree-query-clauses)
|
||||
* Modify [PRIMARY KEY](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#mergetree-query-clauses)
|
||||
* Add [SAMPLE BY](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#mergetree-query-clauses)
|
||||
* Add [table TTL](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#mergetree-query-clauses)
|
||||
|
||||
```sql
|
||||
CREATE DATABASE db_name ENGINE = MaterializedMySQL(...)
|
||||
[SETTINGS ...]
|
||||
[TABLE OVERRIDE table_name (
|
||||
[COLUMNS (
|
||||
[col_name [datatype] [ALIAS expr] [CODEC(...)] [TTL expr], ...]
|
||||
[INDEX index_name expr TYPE indextype[(...)] GRANULARITY val, ...]
|
||||
[PROJECTION projection_name (SELECT <COLUMN LIST EXPR> [GROUP BY] [ORDER BY]), ...]
|
||||
)]
|
||||
[ORDER BY expr]
|
||||
[PRIMARY KEY expr]
|
||||
[PARTITION BY expr]
|
||||
[SAMPLE BY expr]
|
||||
[TTL expr]
|
||||
), ...]
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
CREATE DATABASE db_name ENGINE = MaterializedMySQL(...)
|
||||
TABLE OVERRIDE table1 (
|
||||
COLUMNS (
|
||||
userid UUID,
|
||||
category LowCardinality(String),
|
||||
timestamp DateTime CODEC(Delta, Default)
|
||||
)
|
||||
PARTITION BY toYear(timestamp)
|
||||
),
|
||||
TABLE OVERRIDE table2 (
|
||||
COLUMNS (
|
||||
client_ip String TTL created + INTERVAL 72 HOUR
|
||||
)
|
||||
SAMPLE BY ip_hash
|
||||
)
|
||||
```
|
||||
|
||||
The `COLUMNS` list is sparse; existing columns are modified as specified, extra ALIAS columns are added. It is not
|
||||
possible to add ordinary or MATERIALIZED columns. Modified columns with a different type must be assignable from the
|
||||
original type. There is currently no validation of this or similar issues when the `CREATE DATABASE` query executes, so
|
||||
extra care needs to be taken.
|
||||
|
||||
You may specify overrides for tables that do not exist yet.
|
||||
|
||||
:::important
|
||||
It is easy to break replication with table overrides if not used with care. For example:
|
||||
|
||||
* If an ALIAS column is added with a table override, and a column with the same name is later added to the source
|
||||
MySQL table, the converted ALTER TABLE query in ClickHouse will fail and replication stops.
|
||||
* It is currently possible to add overrides that reference nullable columns where not-nullable are required, such as in
|
||||
`ORDER BY` or `PARTITION BY`. This will cause CREATE TABLE queries that will fail, also causing replication to stop.
|
||||
:::
|
||||
|
||||
## Examples of Use {#examples-of-use}
|
||||
|
||||
Queries in MySQL:
|
||||
|
||||
``` sql
|
||||
mysql> CREATE DATABASE db;
|
||||
mysql> CREATE TABLE db.test (a INT PRIMARY KEY, b INT);
|
||||
mysql> INSERT INTO db.test VALUES (1, 11), (2, 22);
|
||||
mysql> DELETE FROM db.test WHERE a=1;
|
||||
mysql> ALTER TABLE db.test ADD COLUMN c VARCHAR(16);
|
||||
mysql> UPDATE db.test SET c='Wow!', b=222;
|
||||
mysql> SELECT * FROM test;
|
||||
```
|
||||
|
||||
```text
|
||||
┌─a─┬───b─┬─c────┐
|
||||
│ 2 │ 222 │ Wow! │
|
||||
└───┴─────┴──────┘
|
||||
```
|
||||
|
||||
Database in ClickHouse, exchanging data with the MySQL server:
|
||||
|
||||
The database and the table created:
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***');
|
||||
SHOW TABLES FROM mysql;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─name─┐
|
||||
│ test │
|
||||
└──────┘
|
||||
```
|
||||
|
||||
After inserting data:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM mysql.test;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─a─┬──b─┐
|
||||
│ 1 │ 11 │
|
||||
│ 2 │ 22 │
|
||||
└───┴────┘
|
||||
```
|
||||
|
||||
After deleting data, adding the column and updating:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM mysql.test;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─a─┬───b─┬─c────┐
|
||||
│ 2 │ 222 │ Wow! │
|
||||
└───┴─────┴──────┘
|
||||
```
|
||||
|
@ -1,176 +0,0 @@
|
||||
# The MySQL Binlog Client
|
||||
|
||||
The MySQL Binlog Client provides a mechanism in ClickHouse to share the binlog from a MySQL instance among multiple [MaterializedMySQL](../../engines/database-engines/materialized-mysql.md) databases. This avoids consuming unnecessary bandwidth and CPU when replicating more than one schema/database.
|
||||
|
||||
The implementation is resilient against crashes and disk issues. The executed GTID sets of the binlog itself and the consuming databases have persisted only after the data they describe has been safely persisted as well. The implementation also tolerates re-doing aborted operations (at-least-once delivery).
|
||||
|
||||
# Settings
|
||||
|
||||
## use_binlog_client
|
||||
|
||||
Forces to reuse existing MySQL binlog connection or creates new one if does not exist. The connection is defined by `user:pass@host:port`.
|
||||
|
||||
Default value: 0
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
-- create MaterializedMySQL databases that read the events from the binlog client
|
||||
CREATE DATABASE db1 ENGINE = MaterializedMySQL('host:port', 'db1', 'user', 'password') SETTINGS use_binlog_client=1
|
||||
CREATE DATABASE db2 ENGINE = MaterializedMySQL('host:port', 'db2', 'user', 'password') SETTINGS use_binlog_client=1
|
||||
CREATE DATABASE db3 ENGINE = MaterializedMySQL('host:port', 'db3', 'user2', 'password2') SETTINGS use_binlog_client=1
|
||||
```
|
||||
|
||||
Databases `db1` and `db2` will use the same binlog connection, since they use the same `user:pass@host:port`. Database `db3` will use separate binlog connection.
|
||||
|
||||
## max_bytes_in_binlog_queue
|
||||
|
||||
Defines the limit of bytes in the events binlog queue. If bytes in the queue increases this limit, it will stop reading new events from MySQL until the space for new events will be freed. This introduces the memory limits. Very high value could consume all available memory. Very low value could make the databases to wait for new events.
|
||||
|
||||
Default value: 67108864
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
CREATE DATABASE db1 ENGINE = MaterializedMySQL('host:port', 'db1', 'user', 'password') SETTINGS use_binlog_client=1, max_bytes_in_binlog_queue=33554432
|
||||
CREATE DATABASE db2 ENGINE = MaterializedMySQL('host:port', 'db2', 'user', 'password') SETTINGS use_binlog_client=1
|
||||
```
|
||||
|
||||
If database `db1` is unable to consume binlog events fast enough and the size of the events queue exceeds `33554432` bytes, reading of new events from MySQL is postponed until `db1`
|
||||
consumes the events and releases some space.
|
||||
|
||||
NOTE: This will impact to `db2`, and it will be waiting for new events too, since they share the same connection.
|
||||
|
||||
## max_milliseconds_to_wait_in_binlog_queue
|
||||
|
||||
Defines the max milliseconds to wait when `max_bytes_in_binlog_queue` exceeded. After that it will detach the database from current binlog connection and will retry establish new one to prevent other databases to wait for this database.
|
||||
|
||||
Default value: 10000
|
||||
|
||||
**Example**
|
||||
|
||||
```sql
|
||||
CREATE DATABASE db1 ENGINE = MaterializedMySQL('host:port', 'db1', 'user', 'password') SETTINGS use_binlog_client=1, max_bytes_in_binlog_queue=33554432, max_milliseconds_to_wait_in_binlog_queue=1000
|
||||
CREATE DATABASE db2 ENGINE = MaterializedMySQL('host:port', 'db2', 'user', 'password') SETTINGS use_binlog_client=1
|
||||
```
|
||||
|
||||
If the event queue of database `db1` is full, the binlog connection will be waiting in `1000`ms and if the database is not able to consume the events, it will be detached from the connection to create another one.
|
||||
|
||||
NOTE: If the database `db1` has been detached from the shared connection and created new one, after the binlog connections for `db1` and `db2` have the same positions they will be merged to one. And `db1` and `db2` will use the same connection again.
|
||||
|
||||
## max_bytes_in_binlog_dispatcher_buffer
|
||||
|
||||
Defines the max bytes in the binlog dispatcher's buffer before it is flushed to attached binlog. The events from MySQL binlog connection are buffered before sending to attached databases. It increases the events throughput from the binlog to databases.
|
||||
|
||||
Default value: 1048576
|
||||
|
||||
## max_flush_milliseconds_in_binlog_dispatcher
|
||||
|
||||
Defines the max milliseconds in the binlog dispatcher's buffer to wait before it is flushed to attached binlog. If there are no events received from MySQL binlog connection for a while, after some time buffered events should be sent to the attached databases.
|
||||
|
||||
Default value: 1000
|
||||
|
||||
# Design
|
||||
|
||||
## The Binlog Events Dispatcher
|
||||
|
||||
Currently each MaterializedMySQL database opens its own connection to MySQL to subscribe to binlog events. There is a need to have only one connection and _dispatch_ the binlog events to all databases that replicate from the same MySQL instance.
|
||||
|
||||
## Each MaterializedMySQL Database Has Its Own Event Queue
|
||||
|
||||
To prevent slowing down other instances there should be an _event queue_ per MaterializedMySQL database to handle the events independently of the speed of other instances. The dispatcher reads an event from the binlog, and sends it to every MaterializedMySQL database that needs it. Each database handles its events in separate threads.
|
||||
|
||||
## Catching up
|
||||
|
||||
If several databases have the same binlog position, they can use the same dispatcher. If a newly created database (or one that has been detached for some time) requests events that have been already processed, we need to create another communication _channel_ to the binlog. We do this by creating another temporary dispatcher for such databases. When the new dispatcher _catches up with_ the old one, the new/temporary dispatcher is not needed anymore and all databases getting events from this dispatcher can be moved to the old one.
|
||||
|
||||
## Memory Limit
|
||||
|
||||
There is a _memory limit_ to control event queue memory consumption per MySQL Client. If a database is not able to handle events fast enough, and the event queue is getting full, we have the following options:
|
||||
|
||||
1. The dispatcher is blocked until the slowest database frees up space for new events. All other databases are waiting for the slowest one. (Preferred)
|
||||
2. The dispatcher is _never_ blocked, but suspends incremental sync for the slow database and continues dispatching events to remained databases.
|
||||
|
||||
## Performance
|
||||
|
||||
A lot of CPU can be saved by not processing every event in every database. The binlog contains events for all databases, it is wasteful to distribute row events to a database that it will not process it, especially if there are a lot of databases. This requires some sort of per-database binlog filtering and buffering.
|
||||
|
||||
Currently all events are sent to all MaterializedMySQL databases but parsing the event which consumes CPU is up to the database.
|
||||
|
||||
# Detailed Design
|
||||
|
||||
1. If a client (e.g. database) wants to read a stream of the events from MySQL binlog, it creates a connection to remote binlog by host/user/password and _executed GTID set_ params.
|
||||
2. If another client wants to read the events from the binlog but for different _executed GTID set_, it is **not** possible to reuse existing connection to MySQL, then need to create another connection to the same remote binlog. (_This is how it is implemented today_).
|
||||
3. When these 2 connections get the same binlog positions, they read the same events. It is logical to drop duplicate connection and move all its users out. And now one connection dispatches binlog events to several clients. Obviously only connections to the same binlog should be merged.
|
||||
|
||||
## Classes
|
||||
|
||||
1. One connection can send (or dispatch) events to several clients and might be called `BinlogEventsDispatcher`.
|
||||
2. Several dispatchers grouped by _user:password@host:port_ in `BinlogClient`. Since they point to the same binlog.
|
||||
3. The clients should communicate only with public API from `BinlogClient`. The result of using `BinlogClient` is an object that implements `IBinlog` to read events from. This implementation of `IBinlog` must be compatible with old implementation `MySQLFlavor` -> when replacing old implementation by new one, the behavior must not be changed.
|
||||
|
||||
## SQL
|
||||
|
||||
```sql
|
||||
-- create MaterializedMySQL databases that read the events from the binlog client
|
||||
CREATE DATABASE db1_client1 ENGINE = MaterializedMySQL('host:port', 'db', 'user', 'password') SETTINGS use_binlog_client=1, max_bytes_in_binlog_queue=1024;
|
||||
CREATE DATABASE db2_client1 ENGINE = MaterializedMySQL('host:port', 'db', 'user', 'password') SETTINGS use_binlog_client=1;
|
||||
CREATE DATABASE db3_client1 ENGINE = MaterializedMySQL('host:port', 'db2', 'user', 'password') SETTINGS use_binlog_client=1;
|
||||
CREATE DATABASE db4_client2 ENGINE = MaterializedMySQL('host2:port', 'db', 'user', 'password') SETTINGS use_binlog_client=1;
|
||||
CREATE DATABASE db5_client3 ENGINE = MaterializedMySQL('host:port', 'db', 'user1', 'password') SETTINGS use_binlog_client=1;
|
||||
CREATE DATABASE db6_old ENGINE = MaterializedMySQL('host:port', 'db', 'user1', 'password') SETTINGS use_binlog_client=0;
|
||||
```
|
||||
|
||||
Databases `db1_client1`, `db2_client1` and `db3_client1` share one instance of `BinlogClient` since they have the same params. `BinlogClient` will create 3 connections to MySQL server thus 3 instances of `BinlogEventsDispatcher`, but if these connections would have the same binlog position, they should be merged to one connection. Means all clients will be moved to one dispatcher and others will be closed. Databases `db4_client2` and `db5_client3` would use 2 different independent `BinlogClient` instances. Database `db6_old` will use old implementation. NOTE: By default `use_binlog_client` is disabled. Setting `max_bytes_in_binlog_queue` defines the max allowed bytes in the binlog queue. By default, it is `1073741824` bytes. If number of bytes exceeds this limit, the dispatching will be stopped until the space will be freed for new events.
|
||||
|
||||
## Binlog Table Structure
|
||||
|
||||
To see the status of the all `BinlogClient` instances there is `system.mysql_binlogs` system table. It shows the list of all created and _alive_ `IBinlog` instances with information about its `BinlogEventsDispatcher` and `BinlogClient`.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
SELECT * FROM system.mysql_binlogs FORMAT Vertical
|
||||
Row 1:
|
||||
──────
|
||||
binlog_client_name: root@127.0.0.1:3306
|
||||
name: test_Clickhouse1
|
||||
mysql_binlog_name: binlog.001154
|
||||
mysql_binlog_pos: 7142294
|
||||
mysql_binlog_timestamp: 1660082447
|
||||
mysql_binlog_executed_gtid_set: a9d88f83-c14e-11ec-bb36-244bfedf7766:1-30523304
|
||||
dispatcher_name: Applier
|
||||
dispatcher_mysql_binlog_name: binlog.001154
|
||||
dispatcher_mysql_binlog_pos: 7142294
|
||||
dispatcher_mysql_binlog_timestamp: 1660082447
|
||||
dispatcher_mysql_binlog_executed_gtid_set: a9d88f83-c14e-11ec-bb36-244bfedf7766:1-30523304
|
||||
size: 0
|
||||
bytes: 0
|
||||
max_bytes: 0
|
||||
```
|
||||
|
||||
### Tests
|
||||
|
||||
Unit tests:
|
||||
|
||||
```
|
||||
$ ./unit_tests_dbms --gtest_filter=MySQLBinlog.*
|
||||
```
|
||||
|
||||
Integration tests:
|
||||
|
||||
```
|
||||
$ pytest -s -vv test_materialized_mysql_database/test.py::test_binlog_client
|
||||
```
|
||||
|
||||
Dumps events from the file
|
||||
|
||||
```
|
||||
$ ./utils/check-mysql-binlog/check-mysql-binlog --binlog binlog.001392
|
||||
```
|
||||
|
||||
Dumps events from the server
|
||||
|
||||
```
|
||||
$ ./utils/check-mysql-binlog/check-mysql-binlog --host 127.0.0.1 --port 3306 --user root --password pass --gtid a9d88f83-c14e-11ec-bb36-244bfedf7766:1-30462856
|
||||
```
|
@ -7,190 +7,3 @@ sidebar_label: "[experimental] MaterializedMySQL"
|
||||
# [экспериментальный] MaterializedMySQL {#materialized-mysql}
|
||||
|
||||
**Это экспериментальный движок, который не следует использовать в продакшене.**
|
||||
|
||||
Создает базу данных ClickHouse со всеми таблицами, существующими в MySQL, и всеми данными в этих таблицах.
|
||||
|
||||
Сервер ClickHouse работает как реплика MySQL. Он читает файл binlog и выполняет DDL and DML-запросы.
|
||||
|
||||
## Создание базы данных {#creating-a-database}
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
|
||||
ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
|
||||
```
|
||||
|
||||
**Параметры движка**
|
||||
|
||||
- `host:port` — адрес сервера MySQL.
|
||||
- `database` — имя базы данных на удалённом сервере.
|
||||
- `user` — пользователь MySQL.
|
||||
- `password` — пароль пользователя.
|
||||
|
||||
**Настройки движка**
|
||||
|
||||
- `max_rows_in_buffer` — максимальное количество строк, содержимое которых может кешироваться в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`.
|
||||
- `max_bytes_in_buffer` — максимальное количество байтов, которое разрешено кешировать в памяти (для одной таблицы и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`.
|
||||
- `max_rows_in_buffers` — максимальное количество строк, содержимое которых может кешироваться в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `65 505`.
|
||||
- `max_bytes_in_buffers` — максимальное количество байтов, которое разрешено кешировать данным в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества строк, данные будут материализованы. Значение по умолчанию: `1 048 576`.
|
||||
- `max_flush_data_time` — максимальное время в миллисекундах, в течение которого разрешено кешировать данные в памяти (для базы данных и данных кеша, которые невозможно запросить). При превышении количества указанного периода, данные будут материализованы. Значение по умолчанию: `1000`.
|
||||
- `max_wait_time_when_mysql_unavailable` — интервал между повторными попытками, если MySQL недоступен. Указывается в миллисекундах. Отрицательное значение отключает повторные попытки. Значение по умолчанию: `1000`.
|
||||
- `allows_query_when_mysql_lost` — признак, разрешен ли запрос к материализованной таблице при потере соединения с MySQL. Значение по умолчанию: `0` (`false`).
|
||||
|
||||
```sql
|
||||
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***')
|
||||
SETTINGS
|
||||
allows_query_when_mysql_lost=true,
|
||||
max_wait_time_when_mysql_unavailable=10000;
|
||||
```
|
||||
|
||||
**Настройки на стороне MySQL-сервера**
|
||||
|
||||
Для правильной работы `MaterializedMySQL` следует обязательно указать на сервере MySQL следующие параметры конфигурации:
|
||||
- `default_authentication_plugin = mysql_native_password` — `MaterializedMySQL` может авторизоваться только с помощью этого метода.
|
||||
- `gtid_mode = on` — ведение журнала на основе GTID является обязательным для обеспечения правильной репликации.
|
||||
|
||||
:::note Внимание
|
||||
При включении `gtid_mode` вы также должны указать `enforce_gtid_consistency = on`.
|
||||
:::
|
||||
## Виртуальные столбцы {#virtual-columns}
|
||||
|
||||
При работе с движком баз данных `MaterializedMySQL` используются таблицы семейства [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) с виртуальными столбцами `_sign` и `_version`.
|
||||
|
||||
- `_version` — счетчик транзакций. Тип [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
- `_sign` — метка удаления. Тип [Int8](../../sql-reference/data-types/int-uint.md). Возможные значения:
|
||||
- `1` — строка не удалена,
|
||||
- `-1` — строка удалена.
|
||||
|
||||
## Поддержка типов данных {#data_types-support}
|
||||
|
||||
| MySQL | ClickHouse |
|
||||
|-------------------------|--------------------------------------------------------------|
|
||||
| TINY | [Int8](../../sql-reference/data-types/int-uint.md) |
|
||||
| SHORT | [Int16](../../sql-reference/data-types/int-uint.md) |
|
||||
| INT24 | [Int32](../../sql-reference/data-types/int-uint.md) |
|
||||
| LONG | [UInt32](../../sql-reference/data-types/int-uint.md) |
|
||||
| LONGLONG | [UInt64](../../sql-reference/data-types/int-uint.md) |
|
||||
| FLOAT | [Float32](../../sql-reference/data-types/float.md) |
|
||||
| DOUBLE | [Float64](../../sql-reference/data-types/float.md) |
|
||||
| DECIMAL, NEWDECIMAL | [Decimal](../../sql-reference/data-types/decimal.md) |
|
||||
| DATE, NEWDATE | [Date](../../sql-reference/data-types/date.md) |
|
||||
| DATETIME, TIMESTAMP | [DateTime](../../sql-reference/data-types/datetime.md) |
|
||||
| DATETIME2, TIMESTAMP2 | [DateTime64](../../sql-reference/data-types/datetime64.md) |
|
||||
| ENUM | [Enum](../../sql-reference/data-types/enum.md) |
|
||||
| STRING | [String](../../sql-reference/data-types/string.md) |
|
||||
| VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) |
|
||||
| BLOB | [String](../../sql-reference/data-types/string.md) |
|
||||
| BINARY | [FixedString](../../sql-reference/data-types/fixedstring.md) |
|
||||
|
||||
Тип [Nullable](../../sql-reference/data-types/nullable.md) поддерживается.
|
||||
|
||||
Другие типы не поддерживаются. Если таблица MySQL содержит столбец другого типа, ClickHouse выдаст исключение "Неподдерживаемый тип данных" ("Unhandled data type") и остановит репликацию.
|
||||
|
||||
## Особенности и рекомендации {#specifics-and-recommendations}
|
||||
|
||||
### Ограничения совместимости {#compatibility-restrictions}
|
||||
|
||||
Кроме ограничений на типы данных, существует несколько ограничений по сравнению с базами данных MySQL, которые следует решить до того, как станет возможной репликация:
|
||||
|
||||
- Каждая таблица в MySQL должна содержать `PRIMARY KEY`.
|
||||
- Репликация для таблиц, содержащих строки со значениями полей `ENUM` вне диапазона значений (определяется размерностью `ENUM`), не будет работать.
|
||||
|
||||
### DDL-запросы {#ddl-queries}
|
||||
|
||||
DDL-запросы в MySQL конвертируются в соответствующие DDL-запросы в ClickHouse ([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create/index.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md)). Если ClickHouse не может конвертировать какой-либо DDL-запрос, он его игнорирует.
|
||||
|
||||
### Репликация данных {#data-replication}
|
||||
|
||||
Данные являются неизменяемыми со стороны пользователя ClickHouse, но автоматически обновляются путём репликации следующих запросов из MySQL:
|
||||
|
||||
- Запрос `INSERT` конвертируется в ClickHouse в `INSERT` с `_sign=1`.
|
||||
|
||||
- Запрос `DELETE` конвертируется в ClickHouse в `INSERT` с `_sign=-1`.
|
||||
|
||||
- Запрос `UPDATE` конвертируется в ClickHouse в `INSERT` с `_sign=-1` и `INSERT` с `_sign=1`.
|
||||
|
||||
### Выборка из таблиц движка MaterializedMySQL {#select}
|
||||
|
||||
Запрос `SELECT` из таблиц движка `MaterializedMySQL` имеет некоторую специфику:
|
||||
|
||||
- Если в запросе `SELECT` напрямую не указан столбец `_version`, то используется модификатор [FINAL](../../sql-reference/statements/select/from.md#select-from-final). Таким образом, выбираются только строки с `MAX(_version)`.
|
||||
|
||||
- Если в запросе `SELECT` напрямую не указан столбец `_sign`, то по умолчанию используется `WHERE _sign=1`. Таким образом, удаленные строки не включаются в результирующий набор.
|
||||
|
||||
- Результат включает комментарии к столбцам, если они существуют в таблицах базы данных MySQL.
|
||||
|
||||
### Конвертация индексов {#index-conversion}
|
||||
|
||||
Секции `PRIMARY KEY` и `INDEX` в MySQL конвертируются в кортежи `ORDER BY` в таблицах ClickHouse.
|
||||
|
||||
В таблицах ClickHouse данные физически хранятся в том порядке, который определяется секцией `ORDER BY`. Чтобы физически перегруппировать данные, используйте [материализованные представления](../../sql-reference/statements/create/view.md#materialized).
|
||||
|
||||
**Примечание**
|
||||
|
||||
- Строки с `_sign=-1` физически не удаляются из таблиц.
|
||||
- Каскадные запросы `UPDATE/DELETE` не поддерживаются движком `MaterializedMySQL`.
|
||||
- Репликация может быть легко нарушена.
|
||||
- Прямые операции изменения данных в таблицах и базах данных `MaterializedMySQL` запрещены.
|
||||
- На работу `MaterializedMySQL` влияет настройка [optimize_on_insert](../../operations/settings/settings.md#optimize-on-insert). Когда таблица на MySQL сервере меняется, происходит слияние данных в соответсвующей таблице в базе данных `MaterializedMySQL`.
|
||||
|
||||
## Примеры использования {#examples-of-use}
|
||||
|
||||
Запросы в MySQL:
|
||||
|
||||
``` sql
|
||||
mysql> CREATE DATABASE db;
|
||||
mysql> CREATE TABLE db.test (a INT PRIMARY KEY, b INT);
|
||||
mysql> INSERT INTO db.test VALUES (1, 11), (2, 22);
|
||||
mysql> DELETE FROM db.test WHERE a=1;
|
||||
mysql> ALTER TABLE db.test ADD COLUMN c VARCHAR(16);
|
||||
mysql> UPDATE db.test SET c='Wow!', b=222;
|
||||
mysql> SELECT * FROM test;
|
||||
```
|
||||
|
||||
```text
|
||||
+---+------+------+
|
||||
| a | b | c |
|
||||
+---+------+------+
|
||||
| 2 | 222 | Wow! |
|
||||
+---+------+------+
|
||||
```
|
||||
|
||||
База данных в ClickHouse, обмен данными с сервером MySQL:
|
||||
|
||||
База данных и созданная таблица:
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE mysql ENGINE = MaterializedMySQL('localhost:3306', 'db', 'user', '***');
|
||||
SHOW TABLES FROM mysql;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─name─┐
|
||||
│ test │
|
||||
└──────┘
|
||||
```
|
||||
|
||||
После вставки данных:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM mysql.test;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─a─┬──b─┐
|
||||
│ 1 │ 11 │
|
||||
│ 2 │ 22 │
|
||||
└───┴────┘
|
||||
```
|
||||
|
||||
После удаления данных, добавления столбца и обновления:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM mysql.test;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─a─┬───b─┬─c────┐
|
||||
│ 2 │ 222 │ Wow! │
|
||||
└───┴─────┴──────┘
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user