ClickHouse/tests/queries/1_stateful/00175_obfuscator_schema_inference.sh
Azat Khuzhin 9a9bbac19b tests: avoid model overlap for obfuscator
In case of stress tests it is possible, and there is LOGICAL_ERROR in
case of error, which will create core dump.

Actually on CI [1] there error was likely this:

    stress_test_run_17.txt:/usr/share/clickhouse-test/queries/1_stateful/00175_obfuscator_schema_inference.sh: line 18: /tmp/clickhouse-test/1_stateful/model.bin: No such file or directory

So the file had been removed by another concurrent test.

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/42190/56bc85746fa0b553e43c2253250404cfcca46855/stress_test__ubsan_.html

Note, that actually it is enough just to change the name in this two
tests, however let's make them even more error-resistant.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-22 21:14:49 +02:00

31 lines
2.0 KiB
Bash
Executable File

#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CURDIR"/../shell_config.sh
model=$(mktemp "$CLICKHOUSE_TMP/obfuscator-model-XXXXXX.bin")
# Compared to explicitly specifying the structure of the input,
# schema inference adds Nullable(T) to all types, so the model and the results
# are a bit different from test '00175_obfuscator_schema_inference.sh'
$CLICKHOUSE_CLIENT --max_threads 1 --query="SELECT URL, Title, SearchPhrase FROM test.hits LIMIT 1000" > "${CLICKHOUSE_TMP}"/data.tsv
# Test obfuscator without saving the model
$CLICKHOUSE_OBFUSCATOR --input-format TSV --output-format TSV --seed hello --limit 2500 < "${CLICKHOUSE_TMP}"/data.tsv > "${CLICKHOUSE_TMP}"/data2500.tsv 2>/dev/null
# Test obfuscator with saving the model
$CLICKHOUSE_OBFUSCATOR --input-format TSV --output-format TSV --seed hello --limit 0 --save "$model" < "${CLICKHOUSE_TMP}"/data.tsv 2>/dev/null
wc -c < "$model"
$CLICKHOUSE_OBFUSCATOR --input-format TSV --output-format TSV --seed hello --limit 2500 --load "$model" < "${CLICKHOUSE_TMP}"/data.tsv > "${CLICKHOUSE_TMP}"/data2500_load_from_model.tsv 2>/dev/null
rm "$model"
$CLICKHOUSE_LOCAL --structure "URL String, Title String, SearchPhrase String" --input-format TSV --output-format TSV --query "SELECT count(), uniq(URL), uniq(Title), uniq(SearchPhrase) FROM table" < "${CLICKHOUSE_TMP}"/data.tsv
$CLICKHOUSE_LOCAL --structure "URL String, Title String, SearchPhrase String" --input-format TSV --output-format TSV --query "SELECT count(), uniq(URL), uniq(Title), uniq(SearchPhrase) FROM table" < "${CLICKHOUSE_TMP}"/data2500.tsv
$CLICKHOUSE_LOCAL --structure "URL String, Title String, SearchPhrase String" --input-format TSV --output-format TSV --query "SELECT count(), uniq(URL), uniq(Title), uniq(SearchPhrase) FROM table" < "${CLICKHOUSE_TMP}"/data2500_load_from_model.tsv
rm "${CLICKHOUSE_TMP}"/data.tsv
rm "${CLICKHOUSE_TMP}"/data2500.tsv
rm "${CLICKHOUSE_TMP}"/data2500_load_from_model.tsv