One binlog connection for many databases.
Suggesting to disable this feature by default for now. It should be explicitly enabled by SETTINGS use_binlog_client=1.
But if you would permanently enable it in MaterializedMySQLSettings, it should keep old behavior and all tests should pass too.
1. Introduced `IBinlog` and its impl to read the binlog events from socket - `BinlogFromSocket`, or file - `BinlogFromFile`. Based on prev impl of `EventBase` and the same old binlog parsers. It fully keeps BC with old version. Fixed `./check-mysql-binlog` to test new impl.
2. Introduced `BinlogEventsDispatcher`, it reads the event from the source `IBinlog` and sends it to currently attached `IBinlog` instances.
3. Introduced `BinlogClient`, which is used to group a list of `BinlogEventsDispatcher` by MySQL binlog connection which is defined by `user:password@host:port`. All dispatchers with the same binlog position should be merged to one.
4. Introduced `BinlogClientFactory`, which is a singleton and it is used to track all binlogs created over the instance.
5. Introduced `use_binlog_client` setting to `MaterializedMySQL`, which forces to reuse a `BinlogClient` if it already exists in `BinlogClientCatalog` or create new one. By default, it is disabled.
6. Introduced `max_bytes_in_binlog_queue` setting to define the limit of bytes in binlog's queue of events. If bytes in the queue increases this limit, `BinlogEventsDispatcher` will stop reading new events from source `IBinlog` until the space for new events will be freed.
7. Introduced `max_milliseconds_to_wait_in_binlog_queue` setting to define max ms to wait when the max bytes exceeded.
7. Introduced `max_milliseconds_to_wait_in_binlog_queue` setting to define max ms to wait when the max bytes exceeded.
8. Introduced `max_bytes_in_binlog_dispatcher_buffer` setting to define max bytes in the binlog dispatcher's buffer before it is flushed to attached binlogs.
9. Introduced `max_flush_milliseconds_in_binlog_dispatcher` setting to define max milliseconds in the binlog dispatcher's buffer to wait before it is flushed to attached binlogs.
10. Introduced `system.mysql_binlogs` system table, which shows a list of active binlogs.
11. Introduced `UnparsedRowsEvent` and `MYSQL_UNPARSED_ROWS_EVENT`, which defines that an event is not parsed and should be explicitly parsed later.
12. Fixed bug when not possible to apply DDL since syntax error or unsupported SQL.
@larspars is the author of following:
`GTIDSets::contains()`
`ReplicationHelper`
`shouldReconnectOnException()`
If some string literals are used in DDL, they might have `Character Set Introducers`
https://dev.mysql.com/doc/refman/8.0/en/charset-introducer.html
f.e. _utf8mb4'1' which is not parsable by current ParserStringLiteral.
Since we use utf8 by default, suggesting to automatically convert the string literals to utf8
before executing the query and avoid any charset introducers there.
Conversion from utf8 to utf8 is not needed and skipped.
Also it might convert double quotes to single quotes if any
which might solve issues with COMMENT and empty string literals "" in DEFAULT expressions.
SELECT _latin1"abc"; -- might be also valid for MySQL
... DEFAULT "",
... COMMENT "abc"
Currently no DEFAULT expressions are supported, not parsable at all,
but using this MR it allows to parse the expressions together with double quotes as string literals.
Since ClickHouse does not support unquoted utf-8 strings but MySQL does.
Instead of fixing Lexer to recognize utf-8 chars as TokenType::BareWord,
suggesting to quote all unrecognized tokens before applying any DDL.
Actual parsing and validating the syntax will be done by particular Parser.
If there is any TokenType::Error, the query is unable to be parsed anyway.
Quoting such tokens can provide the support of utf-8 names.
See `tryQuoteUnrecognizedTokens` and `QuoteUnrecognizedTokensTest`.
mysql> CREATE TABLE 道.渠(...
is converted to
CREATE TABLE `道`.`渠`(...
Also fixed the bug with missing * while doing SELECT in full sync because db or table name are back quoted when not needed.
1. Dropped support for DatabaseOrdinary for MaterializeMySQL. It
is marked as experimental, and dropping support makes the code
more maintaible, and speeds up integration tests by 50%.
2. Get rid of thread name logic for StorageMaterializeMySQL wrapping,
use setInternalQuery instead (similar to MaterializedPostgreSQL).
We would update the set of seen GTIDs as soon as we saw a GTID_EVENT,
which arrives before a transaction. This would mostly work fine, but
if we lost the connection to MySQL in the middle of a large transaction
we would persist that the transaction had been processed as soon as the
transaction had started. When the connection was reestablished, we
would not process the transaction again, which meant that we only
applied parts of it.
Fix this by updating the seen GTIDs at the end of the transaction
instead.