Commit Graph

96278 Commits

Author SHA1 Message Date
Robert Schulze
64a6aa328e
fix: broken links in documentation (hopefully) 2022-08-29 20:27:06 +00:00
robot-clickhouse
64fa077148
style: fix style 2022-08-29 20:27:06 +00:00
Robert Schulze
4d511332c4
chore: delete obsolete modelEvaluate() function
- superseded by catboostEvaluate() which no longer uses the internal
  repository for external models

- also removed was statement SYSTEM RELOAD MODELS and the monitoring view
  SYSTEM.SYSTEMMODELS
2022-08-29 20:27:06 +00:00
Robert Schulze
6b2b3c1eb3
feat: implement catboost in library-bridge
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.

SQL syntax:

  SELECT
    catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
    ACTION AS target
  FROM amazon_train
  LIMIT 10

Required configuration:

  <catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>

*** Implementation Details ***

The internal protocol between the server and the library-bridge is
simple:

- HTTP GET on path "/extdict_ping":
  A ping, used during the handshake to check if the library-bridge runs.

- HTTP POST on path "extdict_request"
  (1) Send a "catboost_GetTreeCount" request from the server to the
      bridge, containing a library path (e.g /home/user/libcatboost.so) and
      a model path (e.g. /home/user/model.bin). Rirst, this unloads the
      catboost library handler associated to the model path (if it was
      loaded), then loads the catboost library handler associated to the
      model path, then executes GetTreeCount() on the library handler and
      finally sends the result back to the server. Step (1) is called once
      by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
      library path handler is unloaded in the beginning because it contains
      state which may no longer be valid if the user runs
      catboost("/path/to/model.bin", ...) more than once and if "model.bin"
      was updated in between.
  (2) Send "catboost_Evaluate" from the server to the bridge, containing
      the model path and the features to run the interference on. Step (2)
      is called multiple times (once per chunk) by the server from function
      FunctionCatBoostEvaluate::executeImpl(). The library handler for the
      given model path is expected to be already loaded by Step (1).

Fixes #27870
2022-08-29 20:26:45 +00:00
Maksim Kita
88141cae98
Merge pull request #40732 from azat/thread-status-fix-leak
Fix memory leak while pushing to MVs w/o query context (from Kafka/...)
2022-08-29 19:36:25 +02:00
Alexander Tokmakov
eb87e3df16
Merge pull request #40749 from ClickHouse/tavplubix-patch-3
Enable `show_addresses_in_stack_traces` by default
2022-08-29 19:16:36 +03:00
Alexander Tokmakov
1c6dea52e0
Update config.xml 2022-08-29 15:50:05 +03:00
Vladimir C
5cbe7e0846
Merge pull request #40548 from ClickHouse/vdimir/warn-suppress-40330
Add config option warning_supress_regexp
2022-08-29 14:02:00 +02:00
Alexander Tokmakov
a16d4dd605
Merge pull request #40747 from ClickHouse/revert-40710-DWARF-5
Revert "Support for DWARF-5 in in house DWARF parser"
2022-08-29 14:26:24 +03:00
Alexander Tokmakov
69387acffa
Revert "Support for DWARF-5 in in house DWARF parser" 2022-08-29 14:25:53 +03:00
Alexander Tokmakov
8d90d30d37
Merge pull request #40589 from ClickHouse/remove_wrong_code_from_mutations
Remove wrong code for skipping mutations in MergeTree
2022-08-29 14:18:59 +03:00
Alexander Tokmakov
eda0582ec0
Merge pull request #40641 from ClickHouse/fix_startup_of_dropped_replica
Do not try to strartup dropped replica
2022-08-29 14:15:15 +03:00
Azat Khuzhin
f9812d9917 Fix memory leak while pushing to MVs w/o query context (from Kafka/...)
While pushign to MVs, there is a low-level code that create
ThreadGroupStatus/ThreadStatus, it is required to gather some metrics
for system.query_views_log.

But, one should not use ThreadGroupStatus of the MainThreadStatus, since
this structure can hold some state, that may not be cleaned, plus this
may be racy, instead it is better to create new ThreadGroupStatus and
attach it instead.

Also this place misses detachQuery(), and because of this it leaks
ThreadGroupStatus::finished_threads_counters_memory. But it is only the
problem pushing to MVs is done w/o query context (i.e. from Kafka/...),
since when it has query context detachQuery() will be called eventually.

Before this patch series, when I've tried the reproducer with
500 MVs attached to Kafka engine (that @den-crane suggested), jemalloc
report looks like this:

    $ ../jeprof --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7  99.5%  99.5%    910.7  99.5% Snapshot (inline)
         9.5   1.0% 100.5%      9.5   1.0% std::__1::__libcpp_operator_new (inline)
         0.5   0.1% 100.6%      0.5   0.1% DB::TasksStatsCounters::create

And with focus to this place:

    $ ../jeprof --focus Snapshot --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7 100.0% 100.0%    910.7 100.0% Snapshot (inline)
         0.0   0.0% 100.0%    910.7 100.0% DB::QueryPipeline::reset
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::streamToViews
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::threadFunc
         0.0   0.0% 100.0%    910.7 100.0% ProfileEvents::Counters::getPartiallyAtomicSnapshot
         0.0   0.0% 100.0%    910.7 100.0% ~ThreadStatus
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeData
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeStats (inline)

Actually this report does not looks great (you understand it because I
stripped it), because --text does not that smart, but if you will use
--pdf for the report you will see the stacktrace (will attach pdf to the
pull request).

But after this patch series the process RSS does not goes beyond
~700MiB.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:36:33 +02:00
Azat Khuzhin
6da5707f8f Fix possible missing detachQuery() in case of exception in readers
This can create leaks, since detachQuery() responsible for cleaning,
i.e. ThreadGroupStatus::finished_threads_counters_memory

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:30:17 +02:00
Azat Khuzhin
b16891da8d Avoid using of ThreadGroupStatus of the MainThreadStatus
One should not use MainThreadStatus, since ThreadGroupStatus can hold
some states, and it is better not to play with this, since this may
create leaks.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:30:17 +02:00
Azat Khuzhin
9fff08eac7 WriteBufferFromS3: remove unused ThreadGroupStatus
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:30:17 +02:00
alesapin
4ed375ca5b
Merge pull request #40720 from ClickHouse/fix_benign_race
Fix benign race in database replicated worker
2022-08-29 11:13:07 +02:00
Azat Khuzhin
269453a646 Avoid leaking of ThreadGroupStatus::finished_threads_counters_memory
Cleanup them in ThreadStatus::detachQuery(), anyway they cannot be
received after by the client.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:08:50 +02:00
alesapin
7ce0afc0df
Merge pull request #40670 from Avogar/kafka
Add setting to disable limit on kafka_num_consumers
2022-08-29 10:53:35 +02:00
Dan Roscigno
4d9cfb9d05
Merge pull request #40731 from DanRoscigno/replace-symlink
Replace symlink
2022-08-28 21:08:44 -04:00
Alexey Milovidov
b72fceb441
Merge pull request #40708 from lesandie/test_s3_table_functions
Added integration test for s3 table function
2022-08-29 03:49:53 +03:00
Alexey Milovidov
18eaf7d0dc
Merge pull request #40721 from ClickHouse/enable_zero_copy_replication_in_ci
Enable zero-copy replication in CI
2022-08-29 03:49:23 +03:00
Dan Roscigno
f2feac1718
Merge branch 'master' into replace-symlink 2022-08-28 20:49:15 -04:00
Alexey Milovidov
71f6c52c2d
Merge pull request #40727 from amosbird/column-transformer-fix1
Correct format of APPLY transformer param
2022-08-29 03:48:04 +03:00
Alexey Milovidov
307e8f2da9
Merge pull request #40729 from ClickHouse/remove-useless-method
Remove useless method
2022-08-29 03:46:34 +03:00
Alexey Milovidov
9f052e1515
Merge pull request #40733 from azat/fix-doc-check
Update aspell-dict to fix doc check
2022-08-29 03:43:10 +03:00
DanRoscigno
753afd0584 update links 2022-08-28 20:41:29 -04:00
Dan Roscigno
74ce3462fe
Merge branch 'master' into replace-symlink 2022-08-28 20:10:23 -04:00
Azat Khuzhin
35f5e56159 Update aspell-dict to fix doc check
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 00:27:19 +02:00
DanRoscigno
b50fa8b5a9 replace symlinks 2022-08-28 17:34:50 -04:00
DanRoscigno
3c36660488 replace symlinks 2022-08-28 17:27:24 -04:00
Alexey Milovidov
84f9432e7e
Merge pull request #40724 from azat/update-bash-completion
Update available formats for bash completion
2022-08-29 00:14:48 +03:00
Alexey Milovidov
8448242785
Merge pull request #40716 from azat/fix-gdb-index
Fix gdb index (makes gdb 17x faster)
2022-08-29 00:13:58 +03:00
Alexey Milovidov
ae97e880d7
Merge pull request #40710 from azat/DWARF-5
Support for DWARF-5 in in house DWARF parser
2022-08-29 00:13:33 +03:00
Alexey Milovidov
dc49b353cf Fix style 2022-08-28 22:52:23 +02:00
Alexey Milovidov
82ef85e713
Merge pull request #40722 from kssenii/fix-test-02382
Fix flaky test
2022-08-28 23:34:10 +03:00
Alexey Milovidov
27782ceef8 Remove useless method 2022-08-28 22:33:42 +02:00
Dan Roscigno
19d105a7fb
Merge pull request #40725 from DanRoscigno/add-more-slugs
Replacing symlinks in docs with includes
2022-08-28 14:18:31 -04:00
Dan Roscigno
1c675fa1f0
Merge branch 'master' into add-more-slugs 2022-08-28 14:13:25 -04:00
DanRoscigno
71891938ae replace symlinks with includes 2022-08-28 14:08:07 -04:00
Diego Nieto (lesandie)
3d50dbea34 Black reformatting 2022-08-28 20:05:39 +02:00
Alexey Milovidov
a82723b5d9
Merge pull request #40719 from ClickHouse/kssenii-patch-4
Update 02313_filesystem_cache_seeks.queries
2022-08-28 21:04:48 +03:00
Amos Bird
d1fbe51b81
Correct format of APPLY transformer param 2022-08-29 01:21:12 +08:00
Dan Roscigno
96cd94196e
Merge branch 'ClickHouse:master' into add-more-slugs 2022-08-28 12:06:37 -04:00
DanRoscigno
fad2e071eb replace symlinks with includes 2022-08-28 11:58:59 -04:00
DanRoscigno
37127c683c remove symlinks 2022-08-28 11:35:03 -04:00
Azat Khuzhin
29877d3992 Update available formats for bash completion
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-28 17:22:32 +02:00
DanRoscigno
5d1e3ee4d8 remove duplicate 2022-08-28 11:04:51 -04:00
alesapin
bcc8106182 Merge branch 'master' into enable_zero_copy_replication_in_ci 2022-08-28 17:04:08 +02:00
alesapin
98d84402e6 Fix test 2022-08-28 17:03:04 +02:00