Commit Graph

96489 Commits

Author SHA1 Message Date
robot-clickhouse
64fa077148
style: fix style 2022-08-29 20:27:06 +00:00
Robert Schulze
4d511332c4
chore: delete obsolete modelEvaluate() function
- superseded by catboostEvaluate() which no longer uses the internal
  repository for external models

- also removed was statement SYSTEM RELOAD MODELS and the monitoring view
  SYSTEM.SYSTEMMODELS
2022-08-29 20:27:06 +00:00
Robert Schulze
6b2b3c1eb3
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-08-29 20:26:45 +00:00
Vitaly Baranov
33f72fb011
Merge pull request #40060 from ClickHouse/vitlibar-increase-timeout-for-test_concurrent_backups
Increase timeout for test_concurrent_backups
2022-08-29 22:25:56 +02:00
Dan Roscigno
76a45aa750
Merge branch 'master' into add-backup 2022-08-29 16:23:53 -04:00
Dan Roscigno
8e5e1c5e8c
Merge pull request #40774 from DanRoscigno/replace-zh-symlinks
Replace zh symlinks
2022-08-29 15:36:08 -04:00
DanRoscigno
d37029dd82 updates for filename changes 2022-08-29 15:20:28 -04:00
Arthur Passos
dd49b44abb Fix host_regexp hosts file tst 2022-08-29 15:58:18 -03:00
DanRoscigno
576b7ea604 updates for filename changes 2022-08-29 14:39:15 -04:00
Dmitry Novik
e25ed9547e
Update src/Interpreters/ProcessList.h 2022-08-29 20:26:37 +02:00
Denny Crane
29e7414697
Update merge-tree-settings.md 2022-08-29 15:25:46 -03:00
Dmitry Novik
865ee5d0d6 Refactor code 2022-08-29 20:24:35 +02:00
Denny Crane
19c3a9c6bf
Update external-dicts-dict-layout.md 2022-08-29 15:20:46 -03:00
Denny Crane
fe0f18f21d
Update external-dicts-dict-layout.md 2022-08-29 15:19:15 -03:00
Arthur Passos
961365c7a4 Fix CaresPTRResolver not reading hosts file 2022-08-29 15:11:39 -03:00
DanRoscigno
687ac1805a updates for filename changes 2022-08-29 13:59:51 -04:00
Dmitry Novik
1169315580 Add OvercommitTracker blocking 2022-08-29 19:44:05 +02:00
Maksim Kita
88141cae98
Merge pull request #40732 from azat/thread-status-fix-leak
Fix memory leak while pushing to MVs w/o query context (from Kafka/...)
2022-08-29 19:36:25 +02:00
Kseniia Sumarokova
0fd961acd4
Merge branch 'master' into fix-race 2022-08-29 19:33:38 +02:00
Kseniia Sumarokova
c5c48e44ea
Merge branch 'master' into fix-mysql-timeouts 2022-08-29 19:33:29 +02:00
kssenii
db2bc31e17 Remove incorrect assertion 2022-08-29 19:32:47 +02:00
Konstantin Morozov
d185b7a332 refactoring: public ctors 2022-08-29 20:19:20 +03:00
FArthur-cmd
862b53b06f Merge branch 'annoy-2' of https://github.com/Vector-Similarity-Search-for-ClickHouse/ClickHouse into annoy-2 2022-08-29 16:43:39 +00:00
FArthur-cmd
3305af8db2 fix case when query is already matched 2022-08-29 16:43:24 +00:00
DanRoscigno
76a3212fc8 replace symlinks 2022-08-29 12:26:17 -04:00
DanRoscigno
c4b8137d31 replace symlinks 2022-08-29 12:19:50 -04:00
Alexander Tokmakov
eb87e3df16
Merge pull request #40749 from ClickHouse/tavplubix-patch-3
Enable `show_addresses_in_stack_traces` by default
2022-08-29 19:16:36 +03:00
kssenii
545c6c8be4 Fix 2022-08-29 17:50:27 +02:00
Dmitry Novik
cfe509c3de Block overcommit tracker in ProcessList near allocations 2022-08-29 17:49:01 +02:00
robot-clickhouse
92c14e80f1 Update version_date.tsv and changelogs after v22.3.12.19-lts 2022-08-29 14:52:19 +00:00
Alexander Tokmakov
ff2db8e2a7 update submodule 2022-08-29 16:46:21 +02:00
robot-clickhouse
57980161c9 Update version_date.tsv and changelogs after v22.6.7.7-stable 2022-08-29 14:44:03 +00:00
Filatenkov Artur
d73f661732
Merge branch 'master' into annoy-2 2022-08-29 17:33:13 +03:00
robot-clickhouse
4a229ad08c Update version_date.tsv and changelogs after v22.7.5.13-stable 2022-08-29 14:29:06 +00:00
kssenii
b1dab84d97 Review fixes 2022-08-29 16:23:14 +02:00
kssenii
0a6c4b9265 Fix 2022-08-29 16:20:53 +02:00
robot-clickhouse
764e2e5ac8 Update version_date.tsv and changelogs after v22.8.3.13-lts 2022-08-29 14:05:36 +00:00
kssenii
877ade9a50 Merge remote-tracking branch 'upstream/master' into fix-race 2022-08-29 16:05:27 +02:00
Alexander Tokmakov
1c6dea52e0
Update config.xml 2022-08-29 15:50:05 +03:00
Vladimir C
5cbe7e0846
Merge pull request #40548 from ClickHouse/vdimir/warn-suppress-40330
Add config option warning_supress_regexp
2022-08-29 14:02:00 +02:00
Alexander Tokmakov
a16d4dd605
Merge pull request #40747 from ClickHouse/revert-40710-DWARF-5
Revert "Support for DWARF-5 in in house DWARF parser"
2022-08-29 14:26:24 +03:00
Alexander Tokmakov
69387acffa
Revert "Support for DWARF-5 in in house DWARF parser" 2022-08-29 14:25:53 +03:00
Alexander Tokmakov
8d90d30d37
Merge pull request #40589 from ClickHouse/remove_wrong_code_from_mutations
Remove wrong code for skipping mutations in MergeTree
2022-08-29 14:18:59 +03:00
Alexander Tokmakov
eda0582ec0
Merge pull request #40641 from ClickHouse/fix_startup_of_dropped_replica
Do not try to strartup dropped replica
2022-08-29 14:15:15 +03:00
Vitaly Baranov
2bec3d3a7c Increase timeout for test_concurrent_backups 2022-08-29 13:13:43 +02:00
Azat Khuzhin
f9812d9917 Fix memory leak while pushing to MVs w/o query context (from Kafka/...)
While pushign to MVs, there is a low-level code that create
ThreadGroupStatus/ThreadStatus, it is required to gather some metrics
for system.query_views_log.

But, one should not use ThreadGroupStatus of the MainThreadStatus, since
this structure can hold some state, that may not be cleaned, plus this
may be racy, instead it is better to create new ThreadGroupStatus and
attach it instead.

Also this place misses detachQuery(), and because of this it leaks
ThreadGroupStatus::finished_threads_counters_memory. But it is only the
problem pushing to MVs is done w/o query context (i.e. from Kafka/...),
since when it has query context detachQuery() will be called eventually.

Before this patch series, when I've tried the reproducer with
500 MVs attached to Kafka engine (that @den-crane suggested), jemalloc
report looks like this:

    $ ../jeprof --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7  99.5%  99.5%    910.7  99.5% Snapshot (inline)
         9.5   1.0% 100.5%      9.5   1.0% std::__1::__libcpp_operator_new (inline)
         0.5   0.1% 100.6%      0.5   0.1% DB::TasksStatsCounters::create

And with focus to this place:

    $ ../jeprof --focus Snapshot --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7 100.0% 100.0%    910.7 100.0% Snapshot (inline)
         0.0   0.0% 100.0%    910.7 100.0% DB::QueryPipeline::reset
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::streamToViews
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::threadFunc
         0.0   0.0% 100.0%    910.7 100.0% ProfileEvents::Counters::getPartiallyAtomicSnapshot
         0.0   0.0% 100.0%    910.7 100.0% ~ThreadStatus
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeData
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeStats (inline)

Actually this report does not looks great (you understand it because I
stripped it), because --text does not that smart, but if you will use
--pdf for the report you will see the stacktrace (will attach pdf to the
pull request).

But after this patch series the process RSS does not goes beyond
~700MiB.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:36:33 +02:00
Azat Khuzhin
6da5707f8f Fix possible missing detachQuery() in case of exception in readers
This can create leaks, since detachQuery() responsible for cleaning,
i.e. ThreadGroupStatus::finished_threads_counters_memory

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:30:17 +02:00
Azat Khuzhin
b16891da8d Avoid using of ThreadGroupStatus of the MainThreadStatus
One should not use MainThreadStatus, since ThreadGroupStatus can hold
some states, and it is better not to play with this, since this may
create leaks.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:30:17 +02:00
Azat Khuzhin
9fff08eac7 WriteBufferFromS3: remove unused ThreadGroupStatus
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:30:17 +02:00
FArthur-cmd
c6e45fe690 remove build with UBSan 2022-08-29 09:18:15 +00:00