- In general, it is expected that clickhouse-*-bridges and
clickhouse-server were build from the same source version (e.g. are
upgraded "atomically"). If that is not the case, we should at least
be able to detect the mismatch and abort.
- This commit adds a URL parameter "version", defined in a header shared
by the server and bridges. The bridge returns an error in case of
mismatch.
- The version is *not* send and checked for "ping" requests (used for
handshake), only for regular requests send after handshake. This is
because the internally thrown server-side exception due to HTTP
failure does not propagate the exact HTTP error (it only stores the
error as text), and as a result, the server-side handshake code
simply retries in case of error with exponential backoff and finally
fails with a "timeout error". This is reasonable as pings typically
fail due to time out. However, without a rework of HTTP exceptions,
version mismatch during ping would also appear as "timeout" which is
too misleading. The behavior may be changed later if needed.
- Note that introducing a version parameter does not represent a
protocol upgrade itself. Bridges older than the server will simply
ignore the field. Only servers older than the bridges receive an error
but such a situation should never occur in practice.
- When catboost will be added into library-bridge later on, a lot of
code will need to be duplicated.
- Avoid that by factorizing stuff which is not specific to external
dictionaries for reuse into a common base class.
- This is a refactoring without semantic change.
This will allow to distinguish really corrupted data, since right now if
you will CREATE/DETACH/ATTACH such engine you will have the following
error [1]:
2022.08.05 20:02:20.726398 [ 696405 ] {} <Error> StorageFileLog (file_log): Metadata files of table file_log are lost.
[1]: https://s3.amazonaws.com/clickhouse-test-reports/39926/72961328f68b1ec05300d6dc4411a87618a2f46b/stress_test__debug_.html
Likely that previously it was not created to avoid creating empty
directories, however this should be a problem I guess.
Refs: #25969 (@ucasfl)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Instead of checking that number of processors different for different
threads, simply always return empty header from buildChainImpl(), by
adding explicit conversion.
v2: ignore UNKNOWN_TABLE errors in test
In case of parallel INSERT (max_insert_threads > 1) it is possible for
VIEW to be DROP/DETACH'ed while building pipeline for various paralell
streams, and in this case the header will not match since when you have
VIEW you will got empty header and non-empty header otherwise.
And this leads to LOGICAL_ERROR later, while checking that output
headers are the same (in QueryPipelineBuilder::addChains() -> Pipe::addChains()).
However it also makes the pipeline different for various parallel
streams, and it looks like it is better to fail in this case, so instead
of always returning empty header from buildChainImpl() explicit check
had been added.
Note, that I wasn't able to reproduce the issue with the added test,
but CI may have more "luck" (although I've verified it manually).
Fixes: #35902
Cc: @KochetovNicolai
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- ExternalDictionaryLibraryBridgeHelper provides the server-side
interface to access the dictionary part of the library bridge.
- In a later commit, CatBoostLibraryBridgeHelper will be added to access
the catboost part of the library bridge from the server.
Wall of text, sorry, but I also had to document some stuff for myself:
There are three ways to communicate data using HTTP:
- the HTTP verb: for our purposes, PUT and GET,
- the HTTP path: '/ping', '/request' etc.,
- the HTTP URL parameter(s), e.g. 'method=libNew&dictionary_id=1234'
The bridge will use different handlers for communication with the
external dictionary library and for communication with the catboost
library. Handlers are created based on a combination of the HTTP verb
and the HTTP method. More specifically, there will be combinations
- GET + '/extdict_ping'
- PUT + '/extdict_request'
- GET + '/catboost_ping'
- PUT + '/catboost_request'.
For each combination, the bridge expects a certain set of URL
parameters, e.g. for the first combination parameter "dictionary_id" is
expected.
Starting with this commit, the library-bridge creates handlers based on
the first two combinations (the latter two combinations will be added
later). This makes the handler creation mechanism consistent with it's
counterpart in xdbc-bridge.
For that, it was necessary to make both IBridgeHelper methods
"getMainURI()" and "getPingURI()" pure virtual so that derived classes
(LibraryBridgeHelper and XDBCBridgeHelper) must provide custom URLs with
custom paths.
Side note 1: Previously, LibraryBridgeHelper sent HTTP URL parameter
"method=ping" during handshake (PING) but the library-bridge ignored
that parameter. We now omit this parameter, i.e.
LibraryBridgeHelper::PING was removed. Again, this makes things
consistent with xdbc-bridge.
Side note 2: xdbc-bridge is unchanged in this commit. Therefore,
XDBCBridgeHelper now uses the HTTP paths previously in the base class.
For funny reason, XDBCBridgeHelper did not use
IBridgeHelper::getMainURI() - it generates the URLs by itself. I kept it
that way for now but provided an implementation of getMainURI() anyways.
- Rename generic file and identifier names in library-bridge to
something more dictionary-specific. This is needed because later on,
catboost will be integrated into library-bridge.
- Also: Some smaller fixes like typos and un-inlining non-performance
critical code.
- The logic remains unchanged in this commit.