7.3 KiB
toc_priority | toc_title |
---|---|
6 | RabbitMQ |
RabbitMQ Engine
This engine allows integrating ClickHouse with RabbitMQ.
RabbitMQ lets you:
- Publish or subscribe to data flows.
- Process streams as they become available.
Creating a Table
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = RabbitMQ SETTINGS
rabbitmq_host_port = 'host:port',
rabbitmq_exchange_name = 'exchange_name',
rabbitmq_format = 'data_format'[,]
[rabbitmq_exchange_type = 'exchange_type',]
[rabbitmq_routing_key_list = 'key1,key2,...',]
[rabbitmq_row_delimiter = 'delimiter_symbol',]
[rabbitmq_num_consumers = N,]
[rabbitmq_num_queues = N,]
[rabbitmq_transactional_channel = 0]
Required parameters:
rabbitmq_host_port
– host:port (for example,localhost:5672
).rabbitmq_exchange_name
– RabbitMQ exchange name.rabbitmq_format
– Message format. Uses the same notation as the SQLFORMAT
function, such asJSONEachRow
. For more information, see the Formats section.
Optional parameters:
rabbitmq_exchange_type
– The type of RabbitMQ exchange:direct
,fanout
,topic
,headers
,consistent_hash
. Default:fanout
.rabbitmq_routing_key_list
– A comma-separated list of routing keys.rabbitmq_row_delimiter
– Delimiter character, which ends the message.rabbitmq_num_consumers
– The number of consumers per table. Default:1
. Specify more consumers if the throughput of one consumer is insufficient.rabbitmq_num_queues
– The number of queues per consumer. Default:1
. Specify more queues if the capacity of one queue per consumer is insufficient.rabbitmq_transactional_channel
– Wrap insert queries in transactions. Default:0
.rabbitmq_queue_base
- Specify a base name for queues that will be declared.rabbitmq_deadletter_exchange
- Specify name for a dead letter exchange. You can create another table with this exchange name and collect messages in cases when they are republished to dead letter exchange. By default dead letter exchange is not specified.persistent
- If set to 1 (true), in insert query delivery mode will be set to 2 (marks messages as 'persistent'). Default:0
.
Required configuration:
The RabbitMQ server configuration should be added using the ClickHouse config file.
<rabbitmq>
<username>root</username>
<password>clickhouse</password>
</rabbitmq>
Example:
CREATE TABLE queue (
key UInt64,
value UInt64
) ENGINE = RabbitMQ SETTINGS rabbitmq_host_port = 'localhost:5672',
rabbitmq_exchange_name = 'exchange1',
rabbitmq_format = 'JSONEachRow',
rabbitmq_num_consumers = 5;
Description
SELECT
is not particularly useful for reading messages (except for debugging), because each message can be read only once. It is more practical to create real-time threads using materialized views. To do this:
- Use the engine to create a RabbitMQ consumer and consider it a data stream.
- Create a table with the desired structure.
- Create a materialized view that converts data from the engine and puts it into a previously created table.
When the MATERIALIZED VIEW
joins the engine, it starts collecting data in the background. This allows you to continually receive messages from RabbitMQ and convert them to the required format using SELECT
.
One RabbitMQ table can have as many materialized views as you like.
Data can be channeled based on rabbitmq_exchange_type
and the specified rabbitmq_routing_key_list
.
There can be no more than one exchange per table. One exchange can be shared between multiple tables - it enables routing into multiple tables at the same time.
Exchange type options:
direct
- Routing is based on exact matching of keys. Example table key list:key1,key2,key3,key4,key5
, message key can eqaul any of them.fanout
- Routing to all tables (where exchange name is the same) regardless of the keys.topic
- Routing is based on patterns with dot-separated keys. Examples:*.logs
,records.*.*.2020
,*.2018,*.2019,*.2020
.headers
- Routing is based onkey=value
matches with a settingx-match=all
orx-match=any
. Example table key list:x-match=all,format=logs,type=report,year=2020
.consistent-hash
- Data is evenly distributed between all bound tables (where exchange name is the same). Note that this exchange type must be enabled with RabbitMQ plugin:rabbitmq-plugins enable rabbitmq_consistent_hash_exchange
.
Setting rabbitmq_queue_base
may be used for the following cases:
- to be able to restore reading from certain durable queues when not all messages were successfully consumed. Note: it makes sence only if messages are sent with delivery mode 2 - marked 'persistent', durable. To be able to resume consumption from one specific queue - set its name in
rabbitmq_queue_base
setting and do not specifyrabbitmq_num_consumers
andrabbitmq_num_queues
(defaults to 1). To be able to resume consumption from all queues, which were declared for a specific table - just specify the same settings:rabbitmq_queue_base
,rabbitmq_num_consumers
,rabbitmq_num_queues
. By default, queue names will be unique to tables. - to reuse queues as they are declared durable and not auto-deleted.
- to let different tables share queues, so that multiple consumers could be registered for the same queues, which makes better performance. If using
rabbitmq_num_consumers
and/orrabbitmq_num_queues
settings, the exact match of queues is achieved in case these parameters are the same.
If rabbitmq_num_consumers
and/or rabbitmq_num_queues
settings are specified along with rabbitmq_exchange_type
, then:
rabbitmq-consistent-hash-exchange
plugin must be enabled.message_id
property of the published messages must be specified (unique for each message/batch).
For insert query there is message metadata, which is added for each published message: messageID and republished flag - can be accessed via message headers.
Do not use the same table for inserts and materialized views.
Example:
CREATE TABLE queue (
key UInt64,
value UInt64
) ENGINE = RabbitMQ SETTINGS rabbitmq_host_port = 'localhost:5672',
rabbitmq_exchange_name = 'exchange1',
rabbitmq_exchange_type = 'headers',
rabbitmq_routing_key_list = 'format=logs,type=report,year=2020',
rabbitmq_format = 'JSONEachRow',
rabbitmq_num_consumers = 5;
CREATE TABLE daily (key UInt64, value UInt64)
ENGINE = MergeTree() ORDER BY key;
CREATE MATERIALIZED VIEW consumer TO daily
AS SELECT key, value FROM queue;
SELECT key, value FROM daily ORDER BY key;
Virtual Columns
_exchange_name
- RabbitMQ exchange name._consumer_tag
- ConsumerTag of the consumer that received the message._delivery_tag
- DeliveryTag if the message. Scoped per consumer._redelivered
- Redelivered flag of the message.