Compare commits

...

15 Commits

Author SHA1 Message Date
Arthur Passos
b82bbdacea
Merge bc866e64ea into e0f8b8d351 2024-11-21 09:40:48 +04:00
Yakov Olkhovskiy
e0f8b8d351
Merge pull request #70458 from ClickHouse/fix-ephemeral-comment
Fix ephemeral column comment
2024-11-21 05:10:11 +00:00
Alexey Milovidov
da2176d696
Merge pull request #72081 from ClickHouse/add-dashboard-selector
Add advanced dashboard selector
2024-11-21 05:06:51 +00:00
Alexey Milovidov
53e0036593
Merge pull request #72176 from ClickHouse/change-ldf-major-versions
Get rid of `major` tags in official docker images
2024-11-21 05:05:41 +00:00
Alexey Milovidov
25bd73ea5e
Merge pull request #72023 from ClickHouse/fix-bind
Fix comments
2024-11-21 05:03:24 +00:00
Yakov Olkhovskiy
72d5af29e0 Merge branch 'master' into fix-ephemeral-comment 2024-11-20 22:01:54 +00:00
Arthur Passos
bc866e64ea minor adjustment and tests 2024-11-20 15:26:02 -03:00
Arthur Passos
cbda101228 missing double quotes 2024-11-20 12:42:45 -03:00
Mikhail f. Shiryaev
9a2a664b04
Get rid of major tags in official docker images 2024-11-20 16:36:50 +01:00
Arthur Passos
8b3c15b22a Merge branch 'master' into parquet_native_reader_int_logical 2024-11-20 10:27:02 -03:00
Arthur Passos
92a1f0c562 approach1 2024-11-20 10:18:07 -03:00
serxa
ad67608956 Add advanced dashboard selector 2024-11-19 13:18:21 +00:00
Alexey Milovidov
49589da56e Fix comments 2024-11-18 07:18:46 +01:00
Yakov Olkhovskiy
3827d90bb0 add test 2024-10-08 02:37:41 +00:00
Yakov Olkhovskiy
bf3a3ad607 fix ephemeral comment 2024-10-08 02:27:36 +00:00
21 changed files with 398 additions and 184 deletions

View File

@ -16,16 +16,18 @@ ClickHouse works 100-1000x faster than traditional database management systems,
For more information and documentation see https://clickhouse.com/.
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
## Versions
- The `latest` tag points to the latest release of the latest stable branch.
- Branch tags like `22.2` point to the latest release of the corresponding branch.
- Full version tags like `22.2.3.5` point to the corresponding release.
- Full version tags like `22.2.3` and `22.2.3.5` point to the corresponding release.
<!-- docker-official-library:off -->
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
- The tag `head` is built from the latest commit to the default branch.
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
<!-- REMOVE UNTIL HERE -->
<!-- docker-official-library:on -->
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.

View File

@ -10,16 +10,18 @@ ClickHouse works 100-1000x faster than traditional database management systems,
For more information and documentation see https://clickhouse.com/.
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
## Versions
- The `latest` tag points to the latest release of the latest stable branch.
- Branch tags like `22.2` point to the latest release of the corresponding branch.
- Full version tags like `22.2.3.5` point to the corresponding release.
- Full version tags like `22.2.3` and `22.2.3.5` point to the corresponding release.
<!-- docker-official-library:off -->
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
- The tag `head` is built from the latest commit to the default branch.
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
<!-- REMOVE UNTIL HERE -->
<!-- docker-official-library:on -->
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.

View File

@ -522,4 +522,3 @@ sidebar_label: 2024
* Backported in [#68518](https://github.com/ClickHouse/ClickHouse/issues/68518): Minor update in Dynamic/JSON serializations. [#68459](https://github.com/ClickHouse/ClickHouse/pull/68459) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68558](https://github.com/ClickHouse/ClickHouse/issues/68558): CI: Minor release workflow fix. [#68536](https://github.com/ClickHouse/ClickHouse/pull/68536) ([Max K.](https://github.com/maxknv)).
* Backported in [#68576](https://github.com/ClickHouse/ClickHouse/issues/68576): CI: Tidy build timeout from 2h to 3h. [#68567](https://github.com/ClickHouse/ClickHouse/pull/68567) ([Max K.](https://github.com/maxknv)).

View File

@ -497,4 +497,3 @@ sidebar_label: 2024
* Backported in [#69899](https://github.com/ClickHouse/ClickHouse/issues/69899): Revert "Merge pull request [#69032](https://github.com/ClickHouse/ClickHouse/issues/69032) from alexon1234/include_real_time_execution_in_http_header". [#69885](https://github.com/ClickHouse/ClickHouse/pull/69885) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#69931](https://github.com/ClickHouse/ClickHouse/issues/69931): RIPE is an acronym and thus should be capital. RIPE stands for **R**ACE **I**ntegrity **P**rimitives **E**valuation and RACE stands for **R**esearch and Development in **A**dvanced **C**ommunications **T**echnologies in **E**urope. [#69901](https://github.com/ClickHouse/ClickHouse/pull/69901) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Backported in [#70034](https://github.com/ClickHouse/ClickHouse/issues/70034): Revert "Add RIPEMD160 function". [#70005](https://github.com/ClickHouse/ClickHouse/pull/70005) ([Robert Schulze](https://github.com/rschu1ze)).

View File

@ -476,7 +476,7 @@
<input id="edit" type="button" value="✎" style="display: none;">
<input id="add" type="button" value="Add chart" style="display: none;">
<input id="reload" type="button" value="Reload">
<span id="search-span" class="nowrap" style="display: none;"><input id="search" type="button" value="🔎" title="Run query to obtain list of charts from ClickHouse"><input id="search-query" name="search" type="text" spellcheck="false"></span>
<span id="search-span" class="nowrap" style="display: none;"><input id="search" type="button" value="🔎" title="Run query to obtain list of charts from ClickHouse. Either select dashboard name or write your own query"><input id="search-query" name="search" list="search-options" type="text" spellcheck="false"><datalist id="search-options"></datalist></span>
<div id="chart-params"></div>
</div>
</form>
@ -532,9 +532,15 @@ const errorMessages = [
}
]
/// Dashboard selector
const dashboardSearchQuery = (dashboard_name) => `SELECT title, query FROM system.dashboards WHERE dashboard = '${dashboard_name}'`;
let dashboard_queries = {
"Overview": dashboardSearchQuery("Overview"),
};
const default_dashboard = 'Overview';
/// Query to fill `queries` list for the dashboard
let search_query = `SELECT title, query FROM system.dashboards WHERE dashboard = 'Overview'`;
let search_query = dashboardSearchQuery(default_dashboard);
let customized = false;
let queries = [];
@ -1439,7 +1445,7 @@ async function reloadAll(do_search) {
try {
updateParams();
if (do_search) {
search_query = document.getElementById('search-query').value;
search_query = toSearchQuery(document.getElementById('search-query').value);
queries = [];
refreshCustomized(false);
}
@ -1504,7 +1510,7 @@ function updateFromState() {
document.getElementById('url').value = host;
document.getElementById('user').value = user;
document.getElementById('password').value = password;
document.getElementById('search-query').value = search_query;
document.getElementById('search-query').value = fromSearchQuery(search_query);
refreshCustomized();
}
@ -1543,6 +1549,44 @@ if (window.location.hash) {
} catch {}
}
function fromSearchQuery(query) {
for (const dashboard_name in dashboard_queries) {
if (query == dashboard_queries[dashboard_name])
return dashboard_name;
}
return query;
}
function toSearchQuery(value) {
if (value in dashboard_queries)
return dashboard_queries[value];
else
return value;
}
async function populateSearchOptions() {
let {reply, error} = await doFetch("SELECT dashboard FROM system.dashboards GROUP BY dashboard ORDER BY ALL");
if (error) {
throw new Error(error);
}
let data = reply.data;
if (data.dashboard.length == 0) {
console.log("Unable to fetch dashboards list");
return;
}
dashboard_queries = {};
for (let i = 0; i < data.dashboard.length; i++) {
const dashboard = data.dashboard[i];
dashboard_queries[dashboard] = dashboardSearchQuery(dashboard);
}
const searchOptions = document.getElementById('search-options');
for (const dashboard in dashboard_queries) {
const opt = document.createElement('option');
opt.value = dashboard;
searchOptions.appendChild(opt);
}
}
async function start() {
try {
updateFromState();
@ -1558,6 +1602,7 @@ async function start() {
} else {
drawAll();
}
await populateSearchOptions();
} catch (e) {
showError(e.message);
}

View File

@ -528,7 +528,7 @@ QueryTreeNodePtr IdentifierResolver::tryResolveIdentifierFromCompoundExpression(
*
* Resolve strategy:
* 1. Try to bind identifier to scope argument name to node map.
* 2. If identifier is binded but expression context and node type are incompatible return nullptr.
* 2. If identifier is bound but expression context and node type are incompatible return nullptr.
*
* It is important to support edge cases, where we lookup for table or function node, but argument has same name.
* Example: WITH (x -> x + 1) AS func, (func -> func(1) + func) AS lambda SELECT lambda(1);

View File

@ -362,7 +362,7 @@ ReplxxLineReader::ReplxxLineReader(
if (highlighter)
rx.set_highlighter_callback(highlighter);
/// By default C-p/C-n binded to COMPLETE_NEXT/COMPLETE_PREV,
/// By default C-p/C-n bound to COMPLETE_NEXT/COMPLETE_PREV,
/// bind C-p/C-n to history-previous/history-next like readline.
rx.bind_key(Replxx::KEY::control('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_NEXT, code); });
rx.bind_key(Replxx::KEY::control('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_PREVIOUS, code); });
@ -384,9 +384,9 @@ ReplxxLineReader::ReplxxLineReader(
rx.bind_key(Replxx::KEY::control('J'), commit_action);
rx.bind_key(Replxx::KEY::ENTER, commit_action);
/// By default COMPLETE_NEXT/COMPLETE_PREV was binded to C-p/C-n, re-bind
/// By default COMPLETE_NEXT/COMPLETE_PREV was bound to C-p/C-n, re-bind
/// to M-P/M-N (that was used for HISTORY_COMMON_PREFIX_SEARCH before, but
/// it also binded to M-p/M-n).
/// it also bound to M-p/M-n).
rx.bind_key(Replxx::KEY::meta('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_NEXT, code); });
rx.bind_key(Replxx::KEY::meta('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_PREVIOUS, code); });
/// By default M-BACKSPACE is KILL_TO_WHITESPACE_ON_LEFT, while in readline it is backward-kill-word

View File

@ -237,6 +237,7 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
null_modifier.emplace(true);
}
bool is_comment = false;
/// Collate is also allowed after NULL/NOT NULL
if (!collation_expression && s_collate.ignore(pos, expected)
&& !collation_parser.parse(pos, collation_expression, expected))
@ -254,7 +255,9 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
else if (s_ephemeral.ignore(pos, expected))
{
default_specifier = s_ephemeral.getName();
if (!expr_parser.parse(pos, default_expression, expected) && type)
if (s_comment.ignore(pos, expected))
is_comment = true;
if ((is_comment || !expr_parser.parse(pos, default_expression, expected)) && type)
{
ephemeral_default = true;
@ -289,6 +292,8 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
if (require_type && !type && !default_expression)
return false; /// reject column name without type
if (!is_comment)
{
if ((type || default_expression) && allow_null_modifiers && !null_modifier.has_value())
{
if (s_not.ignore(pos, expected))
@ -300,8 +305,9 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
else if (s_null.ignore(pos, expected))
null_modifier.emplace(true);
}
}
if (s_comment.ignore(pos, expected))
if (is_comment || s_comment.ignore(pos, expected))
{
/// should be followed by a string literal
if (!string_literal_parser.parse(pos, comment_expression, expected))

View File

@ -48,6 +48,22 @@ public:
consume(bytes);
}
template <typename TValue, typename ParquetType>
void ALWAYS_INLINE readValuesOfDifferentSize(TValue * dst, size_t count)
{
auto necessary_bytes = count * sizeof(ParquetType);
checkAvaible(necessary_bytes);
const ParquetType* src = reinterpret_cast<const ParquetType*>(data);
for (std::size_t i = 0; i < count; i++)
{
dst[i] = static_cast<TValue>(src[i]);
}
consume(necessary_bytes);
}
void ALWAYS_INLINE readDateTime64FromInt96(DateTime64 & dst)
{
static const int max_scale_num = 9;

View File

@ -240,8 +240,8 @@ TValue * getResizedPrimitiveData(TColumn & column, size_t size)
} // anoynomous namespace
template <>
void ParquetPlainValuesReader<ColumnString>::readBatch(
template <typename TColumn>
void ParquetPlainByteArrayValuesReader<TColumn>::readBatch(
MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values)
{
auto & column = *assert_cast<ColumnString *>(col_ptr.get());
@ -322,8 +322,8 @@ void ParquetBitPlainReader<TColumn>::readBatch(
}
template <>
void ParquetPlainValuesReader<ColumnDecimal<DateTime64>, ParquetReaderTypes::TimestampInt96>::readBatch(
template <typename TColumn>
void ParquetPlainInt96ValuesReader<TColumn>::readBatch(
MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values)
{
auto cursor = col_ptr->size();
@ -350,8 +350,8 @@ void ParquetPlainValuesReader<ColumnDecimal<DateTime64>, ParquetReaderTypes::Tim
);
}
template <typename TColumn, ParquetReaderTypes reader_type>
void ParquetPlainValuesReader<TColumn, reader_type>::readBatch(
template <typename TColumn, typename ParquetType>
void ParquetPlainValuesReader<TColumn, ParquetType>::readBatch(
MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values)
{
auto cursor = col_ptr->size();
@ -365,11 +365,11 @@ void ParquetPlainValuesReader<TColumn, reader_type>::readBatch(
null_map,
/* individual_visitor */ [&](size_t nest_cursor)
{
plain_data_buffer.readValue(column_data[nest_cursor]);
plain_data_buffer.readValuesOfDifferentSize<TValue, ParquetType>(column_data + nest_cursor, 1);
},
/* repeated_visitor */ [&](size_t nest_cursor, UInt32 count)
{
plain_data_buffer.readBytes(column_data + nest_cursor, count * sizeof(TValue));
plain_data_buffer.readValuesOfDifferentSize<TValue, ParquetType>(column_data + nest_cursor, count);
}
);
}
@ -576,18 +576,19 @@ void ParquetRleDictReader<TColumnVector>::readBatch(
}
template class ParquetPlainValuesReader<ColumnInt32>;
template class ParquetPlainValuesReader<ColumnUInt32>;
template class ParquetPlainValuesReader<ColumnInt64>;
template class ParquetPlainValuesReader<ColumnUInt64>;
template class ParquetPlainValuesReader<ColumnBFloat16>;
template class ParquetPlainValuesReader<ColumnFloat32>;
template class ParquetPlainValuesReader<ColumnFloat64>;
template class ParquetPlainValuesReader<ColumnDecimal<Decimal32>>;
template class ParquetPlainValuesReader<ColumnDecimal<Decimal64>>;
template class ParquetPlainValuesReader<ColumnDecimal<DateTime64>>;
template class ParquetPlainValuesReader<ColumnString>;
template class ParquetPlainValuesReader<ColumnUInt8>;
template class ParquetPlainValuesReader<ColumnUInt8, int32_t>;
template class ParquetPlainValuesReader<ColumnInt8, int32_t>;
template class ParquetPlainValuesReader<ColumnUInt16, int32_t>;
template class ParquetPlainValuesReader<ColumnInt16, int32_t>;
template class ParquetPlainValuesReader<ColumnUInt32, int32_t>;
template class ParquetPlainValuesReader<ColumnInt32, int32_t>;
template class ParquetPlainValuesReader<ColumnUInt64, int64_t>;
template class ParquetPlainValuesReader<ColumnInt64, int64_t>;
template class ParquetPlainValuesReader<ColumnFloat32, float>;
template class ParquetPlainValuesReader<ColumnFloat64, double>;
template class ParquetPlainValuesReader<ColumnDecimal<Decimal32>, int32_t>;
template class ParquetPlainValuesReader<ColumnDecimal<Decimal64>, int64_t>;
template class ParquetPlainValuesReader<ColumnDecimal<DateTime64>, int64_t>;
template class ParquetBitPlainReader<ColumnUInt8>;
@ -598,12 +599,10 @@ template class ParquetRleLCReader<ColumnUInt8>;
template class ParquetRleLCReader<ColumnUInt16>;
template class ParquetRleLCReader<ColumnUInt32>;
template class ParquetRleDictReader<ColumnUInt8>;
template class ParquetRleDictReader<ColumnInt32>;
template class ParquetRleDictReader<ColumnUInt32>;
template class ParquetRleDictReader<ColumnInt64>;
template class ParquetRleDictReader<ColumnUInt64>;
template class ParquetRleDictReader<ColumnBFloat16>;
template class ParquetRleDictReader<ColumnFloat32>;
template class ParquetRleDictReader<ColumnFloat64>;
template class ParquetRleDictReader<ColumnDecimal<Decimal32>>;
@ -613,4 +612,8 @@ template class ParquetRleDictReader<ColumnDecimal<Decimal256>>;
template class ParquetRleDictReader<ColumnDecimal<DateTime64>>;
template class ParquetRleDictReader<ColumnString>;
template class ParquetPlainByteArrayValuesReader<ColumnString>;
template class ParquetPlainInt96ValuesReader<ColumnDecimal<DateTime64>>;
}

View File

@ -150,7 +150,7 @@ enum class ParquetReaderTypes
/**
* The definition level is RLE or BitPacked encoding, while data is read directly
*/
template <typename TColumn, ParquetReaderTypes reader_type = ParquetReaderTypes::Normal>
template <typename TColumn, typename ParquetType>
class ParquetPlainValuesReader : public ParquetDataValuesReader
{
public:
@ -172,6 +172,50 @@ private:
ParquetDataBuffer plain_data_buffer;
};
template <typename TColumn>
class ParquetPlainInt96ValuesReader : public ParquetDataValuesReader
{
public:
ParquetPlainInt96ValuesReader(
Int32 max_def_level_,
std::unique_ptr<RleValuesReader> def_level_reader_,
ParquetDataBuffer data_buffer_)
: max_def_level(max_def_level_)
, def_level_reader(std::move(def_level_reader_))
, plain_data_buffer(std::move(data_buffer_))
{}
void readBatch(MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) override;
private:
Int32 max_def_level;
std::unique_ptr<RleValuesReader> def_level_reader;
ParquetDataBuffer plain_data_buffer;
};
template <typename TColumn>
class ParquetPlainByteArrayValuesReader : public ParquetDataValuesReader
{
public:
ParquetPlainByteArrayValuesReader(
Int32 max_def_level_,
std::unique_ptr<RleValuesReader> def_level_reader_,
ParquetDataBuffer data_buffer_)
: max_def_level(max_def_level_)
, def_level_reader(std::move(def_level_reader_))
, plain_data_buffer(std::move(data_buffer_))
{}
void readBatch(MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) override;
private:
Int32 max_def_level;
std::unique_ptr<RleValuesReader> def_level_reader;
ParquetDataBuffer plain_data_buffer;
};
template <typename TColumn>
class ParquetBitPlainReader : public ParquetDataValuesReader
{

View File

@ -173,13 +173,7 @@ ColumnPtr readDictPage(
}
template <typename TColumn>
std::unique_ptr<ParquetDataValuesReader> createPlainReader(
const parquet::ColumnDescriptor & col_des,
RleValuesReaderPtr def_level_reader,
ParquetDataBuffer buffer);
template <is_col_over_big_decimal TColumnDecimal>
template <is_col_over_big_decimal TColumnDecimal, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> createPlainReader(
const parquet::ColumnDescriptor & col_des,
RleValuesReaderPtr def_level_reader,
@ -192,25 +186,62 @@ std::unique_ptr<ParquetDataValuesReader> createPlainReader(
std::move(buffer));
}
template <typename TColumn>
template <typename TColumn, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> createPlainReader(
const parquet::ColumnDescriptor & col_des,
RleValuesReaderPtr def_level_reader,
ParquetDataBuffer buffer)
{
if (std::is_same_v<TColumn, ColumnDecimal<DateTime64>> && col_des.physical_type() == parquet::Type::INT96)
return std::make_unique<ParquetPlainValuesReader<TColumn, ParquetReaderTypes::TimestampInt96>>(
if constexpr (std::is_same_v<TColumn, ColumnDecimal<DateTime64>> && std::is_same_v<ParquetType, ParquetInt96TypeStub>)
return std::make_unique<ParquetPlainInt96ValuesReader<TColumn>>(
col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer));
return std::make_unique<ParquetPlainValuesReader<TColumn>>(
if constexpr (std::is_same_v<ParquetType, ParquetByteArrayTypeStub>)
{
return std::make_unique<ParquetPlainByteArrayValuesReader<TColumn>>(
col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer));
}
return std::make_unique<ParquetPlainValuesReader<TColumn, ParquetType>>(
col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer));
}
template <typename TColumn, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> createReader(
const parquet::ColumnDescriptor & col_descriptor,
RleValuesReaderPtr def_level_reader,
const uint8_t * buffer,
std::size_t buffer_max_size,
const DataTypePtr & base_data_type)
{
if constexpr (std::is_same_v<ParquetType, bool>)
{
auto bit_reader = std::make_unique<arrow::bit_util::BitReader>(buffer, buffer_max_size);
return std::make_unique<ParquetBitPlainReader<TColumn>>(
col_descriptor.max_definition_level(), std::move(def_level_reader), std::move(bit_reader));
}
else
{
ParquetDataBuffer parquet_buffer = [&]()
{
if constexpr (!std::is_same_v<ColumnDecimal<DateTime64>, TColumn>)
return ParquetDataBuffer(buffer, buffer_max_size);
auto scale = assert_cast<const DataTypeDateTime64 &>(*base_data_type).getScale();
return ParquetDataBuffer(buffer, buffer_max_size, scale);
}();
return createPlainReader<TColumn, ParquetType>(col_descriptor, std::move(def_level_reader), parquet_buffer);
}
}
} // anonymous namespace
template <typename TColumn>
ParquetLeafColReader<TColumn>::ParquetLeafColReader(
template <typename TColumn, typename ParquetType>
ParquetLeafColReader<TColumn, ParquetType>::ParquetLeafColReader(
const parquet::ColumnDescriptor & col_descriptor_,
DataTypePtr base_type_,
std::unique_ptr<parquet::ColumnChunkMetaData> meta_,
@ -223,8 +254,8 @@ ParquetLeafColReader<TColumn>::ParquetLeafColReader(
{
}
template <typename TColumn>
ColumnWithTypeAndName ParquetLeafColReader<TColumn>::readBatch(UInt64 rows_num, const String & name)
template <typename TColumn, typename ParquetType>
ColumnWithTypeAndName ParquetLeafColReader<TColumn, ParquetType>::readBatch(UInt64 rows_num, const String & name)
{
reading_rows_num = rows_num;
auto readPageIfEmpty = [&]()
@ -251,9 +282,11 @@ ColumnWithTypeAndName ParquetLeafColReader<TColumn>::readBatch(UInt64 rows_num,
return releaseColumn(name);
}
template <>
void ParquetLeafColReader<ColumnString>::resetColumn(UInt64 rows_num)
template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn, ParquetType>::resetColumn(UInt64 rows_num)
{
if constexpr (std::is_same_v<TColumn, ColumnString>)
{
if (reading_low_cardinality)
{
assert(dictionary);
@ -272,20 +305,19 @@ void ParquetLeafColReader<ColumnString>::resetColumn(UInt64 rows_num)
column = ColumnString::create();
reserveColumnStrRows(column, rows_num);
}
}
template <typename TColumn>
void ParquetLeafColReader<TColumn>::resetColumn(UInt64 rows_num)
{
}
else
{
assert(!reading_low_cardinality);
column = base_data_type->createColumn();
column->reserve(rows_num);
null_map = std::make_unique<LazyNullMap>(rows_num);
}
}
template <typename TColumn>
void ParquetLeafColReader<TColumn>::degradeDictionary()
template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn, ParquetType>::degradeDictionary()
{
// if last batch read all dictionary indices, then degrade is not needed this time
if (!column)
@ -331,8 +363,8 @@ void ParquetLeafColReader<TColumn>::degradeDictionary()
LOG_DEBUG(log, "degraded dictionary to normal column");
}
template <typename TColumn>
ColumnWithTypeAndName ParquetLeafColReader<TColumn>::releaseColumn(const String & name)
template <typename TColumn, typename ParquetType>
ColumnWithTypeAndName ParquetLeafColReader<TColumn, ParquetType>::releaseColumn(const String & name)
{
DataTypePtr data_type = base_data_type;
if (reading_low_cardinality)
@ -365,8 +397,8 @@ ColumnWithTypeAndName ParquetLeafColReader<TColumn>::releaseColumn(const String
return res;
}
template <typename TColumn>
void ParquetLeafColReader<TColumn>::readPage()
template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn, ParquetType>::readPage()
{
// refer to: ColumnReaderImplBase::ReadNewPage in column_reader.cc
// this is where decompression happens
@ -408,8 +440,8 @@ void ParquetLeafColReader<TColumn>::readPage()
}
}
template <typename TColumn>
void ParquetLeafColReader<TColumn>::initDataReader(
template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn, ParquetType>::initDataReader(
parquet::Encoding::type enconding_type,
const uint8_t * buffer,
std::size_t max_size,
@ -425,29 +457,8 @@ void ParquetLeafColReader<TColumn>::initDataReader(
degradeDictionary();
}
if (col_descriptor.physical_type() == parquet::Type::BOOLEAN)
{
if constexpr (std::is_same_v<TColumn, ColumnUInt8>)
{
auto bit_reader = std::make_unique<arrow::bit_util::BitReader>(buffer, max_size);
data_values_reader = std::make_unique<ParquetBitPlainReader<ColumnUInt8>>(col_descriptor.max_definition_level(),
std::move(def_level_reader),
std::move(bit_reader));
}
}
else
{
ParquetDataBuffer parquet_buffer = [&]()
{
if constexpr (!std::is_same_v<ColumnDecimal<DateTime64>, TColumn>)
return ParquetDataBuffer(buffer, max_size);
auto scale = assert_cast<const DataTypeDateTime64 &>(*base_data_type).getScale();
return ParquetDataBuffer(buffer, max_size, scale);
}();
data_values_reader = createPlainReader<TColumn>(
col_descriptor, std::move(def_level_reader), std::move(parquet_buffer));
}
data_values_reader = createReader<TColumn, ParquetType>(
col_descriptor, std::move(def_level_reader), buffer, max_size, base_data_type);
break;
}
case parquet::Encoding::RLE_DICTIONARY:
@ -476,8 +487,8 @@ void ParquetLeafColReader<TColumn>::initDataReader(
}
}
template <typename TColumn>
void ParquetLeafColReader<TColumn>::readPageV1(const parquet::DataPageV1 & page)
template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn, ParquetType>::readPageV1(const parquet::DataPageV1 & page)
{
cur_page_values = page.num_values();
@ -562,8 +573,8 @@ void ParquetLeafColReader<TColumn>::readPageV1(const parquet::DataPageV1 & page)
* The data buffer is "offset-ed" by rl bytes length and then dl decoder is built using RLE decoder. Since dl bytes length was present in the header,
* there is no need to read it and apply an offset like in page v1.
* */
template <typename TColumn>
void ParquetLeafColReader<TColumn>::readPageV2(const parquet::DataPageV2 & page)
template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn, ParquetType>::readPageV2(const parquet::DataPageV2 & page)
{
cur_page_values = page.num_values();
@ -609,11 +620,19 @@ void ParquetLeafColReader<TColumn>::readPageV2(const parquet::DataPageV2 & page)
initDataReader(page.encoding(), buffer, page.size() - total_levels_length, std::move(def_level_reader));
}
template <typename TColumn>
std::unique_ptr<ParquetDataValuesReader> ParquetLeafColReader<TColumn>::createDictReader(
template <typename TColumn, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> ParquetLeafColReader<TColumn, ParquetType>::createDictReader(
std::unique_ptr<RleValuesReader> def_level_reader, std::unique_ptr<RleValuesReader> rle_data_reader)
{
if (reading_low_cardinality && std::same_as<TColumn, ColumnString>)
if constexpr (std::is_same_v<TColumn, ColumnUInt8> || std::is_same_v<TColumn, ColumnInt8>
|| std::is_same_v<TColumn, ColumnUInt16> || std::is_same_v<TColumn, ColumnInt16>)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Dictionary encoding for booleans is not supported");
}
if (reading_low_cardinality)
{
if constexpr (std::same_as<TColumn, ColumnString>)
{
std::unique_ptr<ParquetDataValuesReader> res;
visitColStrIndexType(dictionary->size(), [&]<typename TCol>(TCol *)
@ -625,10 +644,6 @@ std::unique_ptr<ParquetDataValuesReader> ParquetLeafColReader<TColumn>::createDi
});
return res;
}
if (col_descriptor.physical_type() == parquet::Type::type::BOOLEAN)
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Dictionary encoding for booleans is not supported");
}
return std::make_unique<ParquetRleDictReader<TColumn>>(
@ -639,19 +654,23 @@ std::unique_ptr<ParquetDataValuesReader> ParquetLeafColReader<TColumn>::createDi
}
template class ParquetLeafColReader<ColumnUInt8>;
template class ParquetLeafColReader<ColumnInt32>;
template class ParquetLeafColReader<ColumnUInt32>;
template class ParquetLeafColReader<ColumnInt64>;
template class ParquetLeafColReader<ColumnUInt64>;
template class ParquetLeafColReader<ColumnBFloat16>;
template class ParquetLeafColReader<ColumnFloat32>;
template class ParquetLeafColReader<ColumnFloat64>;
template class ParquetLeafColReader<ColumnString>;
template class ParquetLeafColReader<ColumnDecimal<Decimal32>>;
template class ParquetLeafColReader<ColumnDecimal<Decimal64>>;
template class ParquetLeafColReader<ColumnDecimal<Decimal128>>;
template class ParquetLeafColReader<ColumnDecimal<Decimal256>>;
template class ParquetLeafColReader<ColumnDecimal<DateTime64>>;
template class ParquetLeafColReader<ColumnUInt8, bool>;
template class ParquetLeafColReader<ColumnUInt8, int32_t>;
template class ParquetLeafColReader<ColumnInt8, int32_t>;
template class ParquetLeafColReader<ColumnUInt16, int32_t>;
template class ParquetLeafColReader<ColumnInt16, int32_t>;
template class ParquetLeafColReader<ColumnUInt32, int32_t>;
template class ParquetLeafColReader<ColumnInt32, int32_t>;
template class ParquetLeafColReader<ColumnUInt64, int64_t>;
template class ParquetLeafColReader<ColumnInt64, int64_t>;
template class ParquetLeafColReader<ColumnFloat32, float>;
template class ParquetLeafColReader<ColumnFloat64, double>;
template class ParquetLeafColReader<ColumnString, ParquetByteArrayTypeStub>;
template class ParquetLeafColReader<ColumnDecimal<Decimal32>, int32_t>;
template class ParquetLeafColReader<ColumnDecimal<Decimal64>, int64_t>;
template class ParquetLeafColReader<ColumnDecimal<Decimal128>, ParquetByteArrayTypeStub>;
template class ParquetLeafColReader<ColumnDecimal<Decimal256>, ParquetByteArrayTypeStub>;
template class ParquetLeafColReader<ColumnDecimal<DateTime64>, ParquetInt96TypeStub>;
template class ParquetLeafColReader<ColumnDecimal<DateTime64>, int64_t>;
}

View File

@ -17,7 +17,10 @@ class ColumnDescriptor;
namespace DB
{
template <typename TColumn>
struct ParquetByteArrayTypeStub {};
struct ParquetInt96TypeStub {};
template <typename TColumn, typename ParquetType>
class ParquetLeafColReader : public ParquetColumnReader
{
public:

View File

@ -93,19 +93,20 @@ private:
std::unique_ptr<ParquetColumnReader> fromInt32INT(const parquet::IntLogicalType & int_type);
std::unique_ptr<ParquetColumnReader> fromInt64INT(const parquet::IntLogicalType & int_type);
template<class DataType>
template<class ClickHouseType, typename ParquetType>
auto makeLeafReader()
{
return std::make_unique<ParquetLeafColReader<typename DataType::ColumnType>>(
col_descriptor, std::make_shared<DataType>(), std::move(meta), std::move(page_reader));
return std::make_unique<ParquetLeafColReader<typename ClickHouseType::ColumnType, ParquetType>>(
col_descriptor, std::make_shared<ClickHouseType>(), std::move(meta), std::move(page_reader));
}
template<class DecimalType>
template<class DecimalType, typename ParquetType>
auto makeDecimalLeafReader()
{
auto data_type = std::make_shared<DataTypeDecimal<DecimalType>>(
col_descriptor.type_precision(), col_descriptor.type_scale());
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DecimalType>>>(
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DecimalType>, ParquetType>>(
col_descriptor, std::move(data_type), std::move(meta), std::move(page_reader));
}
@ -157,11 +158,11 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt32()
case parquet::LogicalType::Type::INT:
return fromInt32INT(dynamic_cast<const parquet::IntLogicalType &>(*col_descriptor.logical_type()));
case parquet::LogicalType::Type::NONE:
return makeLeafReader<DataTypeInt32>();
return makeLeafReader<DataTypeInt32, int32_t>();
case parquet::LogicalType::Type::DATE:
return makeLeafReader<DataTypeDate32>();
return makeLeafReader<DataTypeDate32, int32_t>();
case parquet::LogicalType::Type::DECIMAL:
return makeDecimalLeafReader<Decimal32>();
return makeDecimalLeafReader<Decimal32, int32_t>();
default:
return throwUnsupported();
}
@ -174,16 +175,16 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt64()
case parquet::LogicalType::Type::INT:
return fromInt64INT(dynamic_cast<const parquet::IntLogicalType &>(*col_descriptor.logical_type()));
case parquet::LogicalType::Type::NONE:
return makeLeafReader<DataTypeInt64>();
return makeLeafReader<DataTypeInt64, int64_t>();
case parquet::LogicalType::Type::TIMESTAMP:
{
const auto & tm_type = dynamic_cast<const parquet::TimestampLogicalType &>(*col_descriptor.logical_type());
auto read_type = std::make_shared<DataTypeDateTime64>(getScaleFromLogicalTimestamp(tm_type.time_unit()));
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>>>(
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>, int64_t>>(
col_descriptor, std::move(read_type), std::move(meta), std::move(page_reader));
}
case parquet::LogicalType::Type::DECIMAL:
return makeDecimalLeafReader<Decimal64>();
return makeDecimalLeafReader<Decimal64, int64_t>();
default:
return throwUnsupported();
}
@ -195,7 +196,7 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromByteArray()
{
case parquet::LogicalType::Type::STRING:
case parquet::LogicalType::Type::NONE:
return makeLeafReader<DataTypeString>();
return makeLeafReader<DataTypeString, ParquetByteArrayTypeStub>();
default:
return throwUnsupported();
}
@ -210,9 +211,9 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromFLBA()
if (col_descriptor.type_length() > 0)
{
if (col_descriptor.type_length() <= static_cast<int>(sizeof(Decimal128)))
return makeDecimalLeafReader<Decimal128>();
return makeDecimalLeafReader<Decimal128, ParquetByteArrayTypeStub>();
if (col_descriptor.type_length() <= static_cast<int>(sizeof(Decimal256)))
return makeDecimalLeafReader<Decimal256>();
return makeDecimalLeafReader<Decimal256, ParquetByteArrayTypeStub>();
}
return throwUnsupported(PreformattedMessage::create(
@ -227,11 +228,23 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt32INT(const parque
{
switch (int_type.bit_width())
{
case 8:
{
if (int_type.is_signed())
return makeLeafReader<DataTypeInt8, int32_t>();
return makeLeafReader<DataTypeUInt8, int32_t>();
}
case 16:
{
if (int_type.is_signed())
return makeLeafReader<DataTypeInt16, int32_t>();
return makeLeafReader<DataTypeUInt16, int32_t>();
}
case 32:
{
if (int_type.is_signed())
return makeLeafReader<DataTypeInt32>();
return makeLeafReader<DataTypeUInt32>();
return makeLeafReader<DataTypeInt32, int32_t>();
return makeLeafReader<DataTypeUInt32, int32_t>();
}
default:
return throwUnsupported(PreformattedMessage::create(", bit width: {}", int_type.bit_width()));
@ -245,8 +258,8 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt64INT(const parque
case 64:
{
if (int_type.is_signed())
return makeLeafReader<DataTypeInt64>();
return makeLeafReader<DataTypeUInt64>();
return makeLeafReader<DataTypeInt64, int64_t>();
return makeLeafReader<DataTypeUInt64, int64_t>();
}
default:
return throwUnsupported(PreformattedMessage::create(", bit width: {}", int_type.bit_width()));
@ -263,7 +276,7 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::makeReader()
switch (col_descriptor.physical_type())
{
case parquet::Type::BOOLEAN:
return makeLeafReader<DataTypeUInt8>();
return makeLeafReader<DataTypeUInt8, bool>();
case parquet::Type::INT32:
return fromInt32();
case parquet::Type::INT64:
@ -276,13 +289,13 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::makeReader()
auto scale = getScaleFromArrowTimeUnit(arrow_properties.coerce_int96_timestamp_unit());
read_type = std::make_shared<DataTypeDateTime64>(scale);
}
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>>>(
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>, ParquetInt96TypeStub>>(
col_descriptor, read_type, std::move(meta), std::move(page_reader));
}
case parquet::Type::FLOAT:
return makeLeafReader<DataTypeFloat32>();
return makeLeafReader<DataTypeFloat32, float>();
case parquet::Type::DOUBLE:
return makeLeafReader<DataTypeFloat64>();
return makeLeafReader<DataTypeFloat64, double>();
case parquet::Type::BYTE_ARRAY:
return fromByteArray();
case parquet::Type::FIXED_LEN_BYTE_ARRAY:

View File

@ -299,8 +299,6 @@ class TagAttrs:
# Only one latest can exist
latest: ClickHouseVersion
# Only one can be a major one (the most fresh per a year)
majors: Dict[int, ClickHouseVersion]
# Only one lts version can exist
lts: Optional[ClickHouseVersion]
@ -345,14 +343,6 @@ def ldf_tags(version: ClickHouseVersion, distro: str, tag_attrs: TagAttrs) -> st
tags.append("lts")
tags.append(f"lts-{distro}")
# If the tag `22`, `23`, `24` etc. should be included in the tags
with_major = tag_attrs.majors.get(version.major) in (None, version)
if with_major:
tag_attrs.majors[version.major] = version
if without_distro:
tags.append(f"{version.major}")
tags.append(f"{version.major}-{distro}")
# Add all normal tags
for tag in (
f"{version.major}.{version.minor}",
@ -384,7 +374,7 @@ def generate_ldf(args: argparse.Namespace) -> None:
args.directory / git_runner(f"git -C {args.directory} rev-parse --show-cdup")
).absolute()
lines = ldf_header(git, directory)
tag_attrs = TagAttrs(versions[-1], {}, None)
tag_attrs = TagAttrs(versions[-1], None)
# We iterate from the most recent to the oldest version
for version in reversed(versions):

View File

@ -378,7 +378,7 @@ def test_reload_via_client(cluster, zk):
configure_from_zk(zk)
break
except QueryRuntimeException:
logging.exception("The new socket is not binded yet")
logging.exception("The new socket is not bound yet")
time.sleep(0.1)
if exception:

View File

@ -0,0 +1,11 @@
drop table if exists test;
CREATE TABLE test (
`start_s` UInt32 EPHEMERAL COMMENT 'start UNIX time' ,
`start_us` UInt16 EPHEMERAL COMMENT 'start microseconds',
`finish_s` UInt32 EPHEMERAL COMMENT 'finish UNIX time',
`finish_us` UInt16 EPHEMERAL COMMENT 'finish microseconds',
`captured` DateTime MATERIALIZED fromUnixTimestamp(start_s),
`duration` Decimal32(6) MATERIALIZED finish_s - start_s + (finish_us - start_us)/1000000
)
ENGINE Null;
drop table if exists test;

View File

@ -0,0 +1,40 @@
-94 53304 17815465730223871
57 15888 33652524900575246
-4 14877 53832092832965652
33 3387 86326601511136103
104 3383 115438187156564782
-11 37403 145056169255259589
-72 46473 159324626361233509
103 35510 173644182696185097
-26 60902 185175917734318892
70 48767 193167023342307884
2 21648 247953090704786001
20 2986 268127160817221407
76 20277 290178827409195337
61 28692 305149163504092270
-74 65427 326871531363668398
-15 20256 351812901947846888
-39 65472 357371822264135234
79 38671 371605113770958364
-29 41706 394460710549666968
92 25026 412913269933311543
-94 53304 17815465730223871
57 15888 33652524900575246
-4 14877 53832092832965652
33 3387 86326601511136103
104 3383 115438187156564782
-11 37403 145056169255259589
-72 46473 159324626361233509
103 35510 173644182696185097
-26 60902 185175917734318892
70 48767 193167023342307884
2 21648 247953090704786001
20 2986 268127160817221407
76 20277 290178827409195337
61 28692 305149163504092270
-74 65427 326871531363668398
-15 20256 351812901947846888
-39 65472 357371822264135234
79 38671 371605113770958364
-29 41706 394460710549666968
92 25026 412913269933311543

View File

@ -0,0 +1,22 @@
#!/usr/bin/env bash
# Tags: no-ubsan, no-fasttest
CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CUR_DIR"/../shell_config.sh
USER_FILES_PATH=$($CLICKHOUSE_CLIENT_BINARY --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}')
WORKING_DIR="${USER_FILES_PATH}/${CLICKHOUSE_TEST_UNIQUE_NAME}"
mkdir -p "${WORKING_DIR}"
DATA_FILE="${CUR_DIR}/data_parquet/multi_column_bf.gz.parquet"
DATA_FILE_USER_PATH="${WORKING_DIR}/multi_column_bf.gz.parquet"
cp ${DATA_FILE} ${DATA_FILE_USER_PATH}
${CLICKHOUSE_CLIENT} --query="select int8_logical, uint16_logical, uint64_logical from file('${DATA_FILE_USER_PATH}', Parquet) order by uint64_logical limit 20 SETTINGS input_format_parquet_use_native_reader=false;";
${CLICKHOUSE_CLIENT} --query="select int8_logical, uint16_logical, uint64_logical from file('${DATA_FILE_USER_PATH}', Parquet) order by uint64_logical limit 20 SETTINGS input_format_parquet_use_native_reader=true;";