* impl
* stash
* clean up
* do not apply when HT is small
* make branch static
* also in merge
* do not hardcode look ahead value
* fix
* apply to methods with cheap key calculation
* more tests
* silence tidy
* fix build
* support HashMethodKeysFixed
* apply during merge only for cheap
* stash
* fixes
* rename method
* add feature flag
* cache prefetch threshold value
* fix
* fix
* Update HashMap.h
* fix typo
* 256KB as default l2 size
Co-authored-by: Alexey Milovidov <milovidov@clickhouse.com>
* Purge jemalloc arenas in case of high memory usage.
* Purge jemalloc arenas in case of high memory usage.
* Get RSS before jemalloc counters. Try to avoid negative RSS.
* Try to avoid negative RSS.
* muzzy -> dirty
* Another fix.
* Update MemoryTracker.cpp
* Wait for purged memory.
* Revert "Wait for purged memory."
This reverts commit 53a2621a2d.
Description:
============
In Computing && Storage Architecture, using S3 as remote / Shared storage, the Method to access S3 is using AWS S3 API
There is a gap between ClickHouse DB with Ozone Operation
In ClickHouse, operation is SQL and background task
In S3 , the operation should be AWS S3 API
And one sql maybe can mapped to multiple API
Solution:
=========
Added Calling API as event into system.events table
- This commit restores statements "SYSTEM RELOAD MODEL(S)" which provide
a mechanism to update a model explicitly. It also saves potentially
unnecessary reloads of a model from disk after it's initial load.
To keep the complexity low, the semantics of "SYSTEM RELOAD MODEL(S)
was changed from eager to lazy. This means that both statements
previously immedately reloaded the specified/all models, whereas now
the statements only trigger an unload and the first call to
catboostEvaluate() does the actual load.
- Monitoring view SYSTEM.MODELS is also restored but with some obsolete
fields removed. The view was not documented in the past and for now it
remains undocumented. The commit is thus not considered a breach of
ClickHouse's public interface.
ClickHouse changes to the folly parser:
- use camel_case
- add NOLINT
- avoid using folly:: (use std:: instead)
- avoid using boost:: (use std:: instead)
But note, now it has not been enabled by default (like it was
initially), because you may need recent debugger to support DWARF-5
correctly, and to make debugging easier, let's do this later.
A good example is gdb 10, even though it looks like it should support
it, it still produce some errors, like here [1]:
Dwarf Error: DW_FORM_strx1 found in non-DWO CU [in module /usr/bin/clickhouse]
[1]: https://github.com/ClickHouse/ClickHouse/pull/40772#issuecomment-1236331323
And not only it complains, apparently it can "activate" SDT probes
(replace "nop" with "int3"), and I believe this is what happens here
[2].
[2]: https://github.com/ClickHouse/ClickHouse/pull/41063#issuecomment-1242992314
There you got int3 in the case when ClickHouse got SIGTRAP:
<details>
```
0x7f494705e093 <+1139>: jne 0x7f494705e450 ; <+2096> [inlined] update_tls_slotinfo at dl-open.c:732
0x7f494705e099 <+1145>: testl %r13d, %r13d
0x7f494705e09c <+1148>: je 0x7f494705e09f ; <+1151> at dl-open.c:744:6
0x7f494705e09e <+1150>: int3
-> 0x7f494705e09f <+1151>: movl -0x54(%rbp), %eax
0x7f494705e0a2 <+1154>: testl %eax, %eax
0x7f494705e0a4 <+1156>: jne 0x7f494705e410 ; <+2032> at dl-open.c:745:5
But if I repeat the query it does not:
0x7ffff7fe5093 <+1139>: jne 0x7ffff7fe5450 ; <+2096> [inlined] update_tls_slotinfo at dl-open.c:732
0x7ffff7fe5099 <+1145>: testl %r13d, %r13d
0x7ffff7fe509c <+1148>: je 0x7ffff7fe509f ; <+1151> at dl-open.c:744:6
0x7ffff7fe509e <+1150>: nop
-> 0x7ffff7fe509f <+1151>: movl -0x54(%rbp), %eax
0x7ffff7fe50a2 <+1154>: testl %eax, %eax
0x7ffff7fe50a4 <+1156>: jne 0x7ffff7fe5410 ; <+2032> at dl-open.c:745:5
```
</details>
Test command was:
clickhouse local --stacktrace -q "select * from file('data.capnp', 'CapnProto', 'val1 char') settings format_schema='nonexist:Message'
*P.S. I did this, because I have libraries compiled with DWARF5 (i.e. glibc), and dwarf parser simply fails on my dev env.*
Refs: 490b287ca3
(cherry picked from commit ee5696bb32)
(cherry picked from commit e03870bc8b)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This commit moves the catboost model evaluation out of the server
process into the library-bridge binary. This serves two goals: On the
one hand, crashes / memory corruptions of the catboost library no longer
affect the server. On the other hand, we can forbid loading dynamic
libraries in the server (catboost was the last consumer of this
functionality), thus improving security.
SQL syntax:
SELECT
catboostEvaluate('/path/to/model.bin', FEAT_1, ..., FEAT_N) > 0 AS prediction,
ACTION AS target
FROM amazon_train
LIMIT 10
Required configuration:
<catboost_lib_path>/path/to/libcatboostmodel.so</catboost_lib_path>
*** Implementation Details ***
The internal protocol between the server and the library-bridge is
simple:
- HTTP GET on path "/extdict_ping":
A ping, used during the handshake to check if the library-bridge runs.
- HTTP POST on path "extdict_request"
(1) Send a "catboost_GetTreeCount" request from the server to the
bridge, containing a library path (e.g /home/user/libcatboost.so) and
a model path (e.g. /home/user/model.bin). Rirst, this unloads the
catboost library handler associated to the model path (if it was
loaded), then loads the catboost library handler associated to the
model path, then executes GetTreeCount() on the library handler and
finally sends the result back to the server. Step (1) is called once
by the server from FunctionCatBoostEvaluate::getReturnTypeImpl(). The
library path handler is unloaded in the beginning because it contains
state which may no longer be valid if the user runs
catboost("/path/to/model.bin", ...) more than once and if "model.bin"
was updated in between.
(2) Send "catboost_Evaluate" from the server to the bridge, containing
the model path and the features to run the interference on. Step (2)
is called multiple times (once per chunk) by the server from function
FunctionCatBoostEvaluate::executeImpl(). The library handler for the
given model path is expected to be already loaded by Step (1).
Fixes#27870
This will avoid CANNOT_PARSE_ELF error for builds that has empty debug
file in clickhouse-common-static-dbg package, i.e. debug build.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Previous to #40769, only `hostent::h_aliases` was being accessed. After that PR got merged, `hostent::h_name` started being accessed as well. This PR moves the first `hostent::h_aliases != nullptr` check that could prevent `hostent::h_name` from being accessed. During debugging, I observed that even when there are not aliases, `hostent::h_aliases` is not null. That's why it hasn't caused any problems, but proposing this change to be on the safe side.