Commit Graph

97061 Commits

Author SHA1 Message Date
Robert Schulze
7a46b862a6
fix: skip SYSTEM.MODELS In test 01161_all_system_tables.sh
The test is a stress test which queries all system tables in parallel.
System table SYSTEM.MODELS is now populated using the library-bridge.
This made the test fail with error:

   2022-09-12 23:51:34 Code: 410. DB::Exception: Received from localhost:9000. DB::Exception: BridgeHelper: clickhouse-library-bridge is not responding. (EXTERNAL_SERVER_IS_NOT_RESPONDING)
   2022-09-12 23:51:34 (query: SELECT * FROM system.models LIMIT 10000 FORMAT Null)

Looking at the logs, the server tried to start the library-bridge when
querying SYSTEM.MODELS but multiple handshake attempts failed.

So (most likely) the infrastructure is slow or (unlikely) there are
settings in CI (e.g. a firewall) preventing communication with the
bridge. Because 1. other system tables are also excluded (zookeeper,
merge_tree_metadata_cache), 2. SYSTEM.MODELS is tested in integration
test "test_catboost_evaluate" and 3. the test runs locally just fine, I
am excluding SYSTEM.MODELS from it.
2022-09-13 08:22:41 +00:00
Robert Schulze
fac1be9700
chore: restore SYSTEM RELOAD MODEL(S) and moniting view SYSTEM.MODELS
- This commit restores statements "SYSTEM RELOAD MODEL(S)" which provide
  a mechanism to update a model explicitly. It also saves potentially
  unnecessary reloads of a model from disk after it's initial load.

  To keep the complexity low, the semantics of "SYSTEM RELOAD MODEL(S)
  was changed from eager to lazy. This means that both statements
  previously immedately reloaded the specified/all models, whereas now
  the statements only trigger an unload and the first call to
  catboostEvaluate() does the actual load.

- Monitoring view SYSTEM.MODELS is also restored but with some obsolete
  fields removed. The view was not documented in the past and for now it
  remains undocumented. The commit is thus not considered a breach of
  ClickHouse's public interface.
2022-09-12 19:33:02 +00:00
Robert Schulze
c16707ff00
chore: delete obsolete modelEvaluate() function + SYSTEM.MODELS view
- The deleted function modelEvaluate() was superseded by
  catboostEvaluate().

- Also delete the external model repository, as modelEvaluate() was it's
  last user. Additionally remove the system view SYSTEM.MODELS for
  inspecting the repository.

- SYSTEM RELOAD MODELS is also obsolete. HOWEVER, it was retained and
  made a no-op instead of deleted.

  Why?
  The reason is that RBAC in distributed setups works by storing
  privileges (granted and revoked) as plain SQL statements in Keeper.
  Nodes read these statements at startup and parse them. If a privilege
  for SYSTEM RELOAD MODELS exists but parser doesn't recognize it
  nodes would fail to come up.

  Considered but rejected alternatives:
  - Ignore SYSTEM RELOAD MODELS during parsing RBAC privileges and
    return an error for regular SYSTEM RELOAD MODELS SQL. Special-case
    of no-op behavior, too brittle.
  - Remove SYSTEM RELOAD MODELS manually from Keeper via command-line
    manipulation of Keeper nodes or via SQL by dropping the privileges.
    Needs user intervention during upgrade.
2022-09-08 09:10:11 +00:00
Robert Schulze
60f9f6855d
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-09-08 09:01:32 +00:00
Robert Schulze
68808858a5
Merge pull request #41050 from FrankChen021/exception_safe
Fix failed stress test (OpenTelemetry)
2022-09-08 09:19:54 +02:00
Robert Schulze
9d4de0cbaa
Merge pull request #40999 from ClickHouse/sse2-special-build
Add special x86-SSE2-only build
2022-09-08 09:06:29 +02:00
Alexey Milovidov
9544b8fdd6
Merge pull request #40996 from ClickHouse/vdimir/issue-40994
Minor update doc for mysql_port
2022-09-08 02:39:12 +03:00
Alexey Milovidov
84a00e3992
Merge pull request #41087 from peter279k/improve_clickhouse_start
Improve clickhouse start command
2022-09-08 02:35:02 +03:00
Nikolay Degterinsky
5f6699ab1e
Merge pull request #41093 from den-crane/patch-46
Doc. update date_diff
2022-09-07 21:23:02 +02:00
Denny Crane
a75eb5ad84
Update date-time-functions.md 2022-09-07 15:59:23 -03:00
Yuko Takagi
fb6b26c7a4
Update README.md (#41091) 2022-09-07 20:58:36 +02:00
Denny Crane
0071ef9e38
Update date-time-functions.md 2022-09-07 15:56:31 -03:00
peter279k
1ae54d3d16 Improve clickhouse start command 2022-09-08 01:18:27 +08:00
Kseniia Sumarokova
a270eeef91
Merge pull request #41008 from kssenii/refactor-merge-tree-read
Small refactoring around merge tree readers (get rid of data part ptr)
2022-09-07 18:27:33 +02:00
Dmitry Novik
499e479892
Merge pull request #40873 from azat/build/fix-debug-symbols-quirk
Fix debug symbols
2022-09-07 17:31:35 +02:00
alesapin
365438d617
Merge pull request #41016 from ClickHouse/one_more_logging
Slightly improve diagnostics and remove assertions
2022-09-07 15:23:17 +02:00
Vitaly Baranov
31ed722572
Merge pull request #41044 from vitlibar/more-conventional-conversion-yaml-to-xmk
More conventional conversion yaml to xml
2022-09-07 13:46:32 +02:00
Mikhail f. Shiryaev
0fcee94835
Merge pull request #41071 from peter279k/remove_non_existed_trains
Remove non-existed released trains
2022-09-07 13:18:18 +02:00
Kruglov Pavel
8513af5c1b
Merge pull request #40638 from azat/mergetree/insert-perf
Do not obtain storage snapshot for each INSERT block (slightly improves performance)
2022-09-07 12:53:09 +02:00
Robert Schulze
c07f234f09
fix: disable ENABLE_MULTITARGET_CODE for SSE2 builds 2022-09-07 10:52:31 +00:00
peter279k
b716988991 Remove non-existed released trains 2022-09-07 18:39:38 +08:00
Robert Schulze
bf8fed8be8
Revert "fix: don't force-inline SSE3 code into generic code"
This reverts commit d054ffd110.
2022-09-07 09:45:07 +00:00
Mikhail f. Shiryaev
7f9d9f6ec7
Merge pull request #41074 from ClickHouse/can-be-tested
CheckLabels: Print message that 'can be tested' is missing
2022-09-07 11:28:53 +02:00
Antonio Andelic
d15fdae8dc
Merge pull request #40853 from ClickHouse/embeddedrocksdb-delete-update-support
Support for `UPDATE` and `DELETE` in `EmbeddedRocksDB`
2022-09-07 11:03:49 +02:00
alesapin
81c98dadd2 Remove redundant change 2022-09-07 11:01:06 +02:00
robot-clickhouse
01139d9d28 Automatic style fix 2022-09-07 08:51:02 +00:00
Robert Schulze
497b65f41d
CheckLabels: Print message that 'can be tested' is missing 2022-09-07 08:42:53 +00:00
Frank Chen
fc05b05be3 Fix style and typo
Signed-off-by: Frank Chen <frank.chen021@outlook.com>
2022-09-07 15:14:43 +08:00
Frank Chen
de8f6bdce7 More safe
Signed-off-by: Frank Chen <frank.chen021@outlook.com>
2022-09-07 13:39:12 +08:00
Dan Roscigno
3735a960c3
Merge pull request #41064 from DanRoscigno/rehome-NY-crime-data
move doc
2022-09-06 18:58:17 -04:00
DanRoscigno
3073da9ba5 move doc 2022-09-06 18:39:06 -04:00
Nikolay Degterinsky
981e9dbce2
Merge pull request #40997 from canhld94/ch_canh_fix_grouping_set
Fix grouping set with group_by_use_nulls
2022-09-07 00:08:31 +02:00
alesapin
6ded03c000 Disable fetch shortcut for zero copy replication 2022-09-07 00:00:10 +02:00
Robert Schulze
d054ffd110
fix: don't force-inline SSE3 code into generic code
Force-inlining code compiled for SSE3 into "generic"
(non-platform-specific) code works for standard x86 builds where
everything is compiled with SSE 4.2 (and smaller). It no longer works
if we compile everything only with SSE 2.
2022-09-06 18:23:19 +00:00
alesapin
43493389b9 Merge branch 'master' into one_more_logging 2022-09-06 19:40:24 +02:00
alesapin
b778b9f37f Improve logging better 2022-09-06 19:25:58 +02:00
alesapin
09e97a6381 Fix style 2022-09-06 18:38:34 +02:00
alesapin
ceed9f418b Return better errors handling 2022-09-06 18:22:44 +02:00
Robert Schulze
cc1bd3ac36
fix: disable vectorscan when building w/o SSE >=3 2022-09-06 16:15:50 +00:00
Dan Roscigno
2587ba96c3
Merge pull request #41055 from DanRoscigno/update-backup-frontmatter
move title to frontmatter in Backup docs
2022-09-06 11:53:40 -04:00
DanRoscigno
7032a1b267 move title to frontmatter 2022-09-06 11:14:55 -04:00
Vitaly Baranov
63e992d52d Edit test configs. 2022-09-06 17:09:26 +02:00
Frank Chen
6ced4131ca exception safe
Signed-off-by: Frank Chen <frank.chen021@outlook.com>
2022-09-06 22:11:47 +08:00
Kseniia Sumarokova
3558361a05
Merge branch 'master' into refactor-merge-tree-read 2022-09-06 16:00:43 +02:00
Vitaly Baranov
6b55f4dd68 Use pretty-print to output preprocessed configs in readable form. 2022-09-06 15:01:47 +02:00
Vitaly Baranov
e9b75deeba Make conversion YAML->XML more conventional. 2022-09-06 15:01:41 +02:00
alesapin
422b1658eb Review fix 2022-09-06 14:42:48 +02:00
Robert Schulze
652f1bfd19
fix: pass -DNO_SSE3_OR_HIGHER=1 from packager 2022-09-06 12:18:11 +00:00
alesapin
6ea7f1e011 Better exception handling for ReadBufferFromS3 2022-09-06 13:59:55 +02:00
Nikita Taranov
7c4f42d014
Skip empty literals in lz4 decompression (#40142) 2022-09-06 13:58:26 +02:00