The Kafka table engine allows global configuration and per-Kafka-topic
configuration. The latter uses syntax <kafka_TOPIC>, e.g. for topic
"football":
<kafka_football>
<retry_backoff_ms>250</retry_backoff_ms>
<fetch_min_bytes>100000</fetch_min_bytes>
</kafka_football>
Some users had to find out the hard way that such configuration doesn't
take effect if the topic name contains a period, e.g. "sports.football".
The reason is that ClickHouse configuration framework already uses
periods as level separators to descend the configuration hierarchy.
(Besides that, per-topic configuration at the same level as global
configuration could be considered ugly.)
Note that Kafka topics may contain characters "a-zA-Z0-9._-" (*) and
a tree-like topic organization using periods is quite common in
practice.
This PR deprecates the existing per-topic configuration syntax (but
continues to support it for backward compat) and introduces a new
per-topic configuration syntax below the global Kafka configuration of
the form:
<kafka>
<topic name="football">
<retry_backoff_ms>250</retry_backoff_ms>
<fetch_min_bytes>100000</fetch_min_bytes>
</topic>
</kafka>
The period restriction doesn't apply to XML attributes, so <topic
name="sports.football"> will work. Also, everything Kafka-related is
below <kafka>.
Considered but rejected alternatives:
- Extending Poco ConfigurationView with custom separators (e.g."/"
instead of "."). Won't work easily because ConfigurationView only
builds a path but defers descending the configuration tree to the
normal configuration classes.
- Reloading the configuration file in StorageKafka (instead of reading
the loaded file) but with a custom separator. This mode is supported
by XML configuration. Too ugly and error-prone since the true
configuration is composed from multiple configuration files.
(*) https://stackoverflow.com/a/37067544
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
previously allowed.
Hence, this change
- removes shared_ptr_helper and as a result all inherited create() methods,
- instead, Storage objects are now created using make_shared<>() by the
caller (for that to work, many constructors had to be made public), and
- all Storage classes were marked as noncopyable using boost::noncopyable.
In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
Wrap them into StorageKafkaInterceptors to allow access to private
fields and add logging inside interceptors if something fails.
This is also preparation for ThreadStatus interceptor.