Compare commits

...

15 Commits

Author SHA1 Message Date
Arthur Passos
b82bbdacea
Merge bc866e64ea into e0f8b8d351 2024-11-21 09:40:48 +04:00
Yakov Olkhovskiy
e0f8b8d351
Merge pull request #70458 from ClickHouse/fix-ephemeral-comment
Fix ephemeral column comment
2024-11-21 05:10:11 +00:00
Alexey Milovidov
da2176d696
Merge pull request #72081 from ClickHouse/add-dashboard-selector
Add advanced dashboard selector
2024-11-21 05:06:51 +00:00
Alexey Milovidov
53e0036593
Merge pull request #72176 from ClickHouse/change-ldf-major-versions
Get rid of `major` tags in official docker images
2024-11-21 05:05:41 +00:00
Alexey Milovidov
25bd73ea5e
Merge pull request #72023 from ClickHouse/fix-bind
Fix comments
2024-11-21 05:03:24 +00:00
Yakov Olkhovskiy
72d5af29e0 Merge branch 'master' into fix-ephemeral-comment 2024-11-20 22:01:54 +00:00
Arthur Passos
bc866e64ea minor adjustment and tests 2024-11-20 15:26:02 -03:00
Arthur Passos
cbda101228 missing double quotes 2024-11-20 12:42:45 -03:00
Mikhail f. Shiryaev
9a2a664b04
Get rid of major tags in official docker images 2024-11-20 16:36:50 +01:00
Arthur Passos
8b3c15b22a Merge branch 'master' into parquet_native_reader_int_logical 2024-11-20 10:27:02 -03:00
Arthur Passos
92a1f0c562 approach1 2024-11-20 10:18:07 -03:00
serxa
ad67608956 Add advanced dashboard selector 2024-11-19 13:18:21 +00:00
Alexey Milovidov
49589da56e Fix comments 2024-11-18 07:18:46 +01:00
Yakov Olkhovskiy
3827d90bb0 add test 2024-10-08 02:37:41 +00:00
Yakov Olkhovskiy
bf3a3ad607 fix ephemeral comment 2024-10-08 02:27:36 +00:00
21 changed files with 398 additions and 184 deletions

View File

@ -16,16 +16,18 @@ ClickHouse works 100-1000x faster than traditional database management systems,
For more information and documentation see https://clickhouse.com/. For more information and documentation see https://clickhouse.com/.
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
## Versions ## Versions
- The `latest` tag points to the latest release of the latest stable branch. - The `latest` tag points to the latest release of the latest stable branch.
- Branch tags like `22.2` point to the latest release of the corresponding branch. - Branch tags like `22.2` point to the latest release of the corresponding branch.
- Full version tags like `22.2.3.5` point to the corresponding release. - Full version tags like `22.2.3` and `22.2.3.5` point to the corresponding release.
<!-- docker-official-library:off -->
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
- The tag `head` is built from the latest commit to the default branch. - The tag `head` is built from the latest commit to the default branch.
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`. - Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
<!-- REMOVE UNTIL HERE --> <!-- REMOVE UNTIL HERE -->
<!-- docker-official-library:on -->
### Compatibility ### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3. - The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.

View File

@ -10,16 +10,18 @@ ClickHouse works 100-1000x faster than traditional database management systems,
For more information and documentation see https://clickhouse.com/. For more information and documentation see https://clickhouse.com/.
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
## Versions ## Versions
- The `latest` tag points to the latest release of the latest stable branch. - The `latest` tag points to the latest release of the latest stable branch.
- Branch tags like `22.2` point to the latest release of the corresponding branch. - Branch tags like `22.2` point to the latest release of the corresponding branch.
- Full version tags like `22.2.3.5` point to the corresponding release. - Full version tags like `22.2.3` and `22.2.3.5` point to the corresponding release.
<!-- docker-official-library:off -->
<!-- This is not related to the docker official library, remove it before commit to https://github.com/docker-library/docs -->
- The tag `head` is built from the latest commit to the default branch. - The tag `head` is built from the latest commit to the default branch.
- Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`. - Each tag has optional `-alpine` suffix to reflect that it's built on top of `alpine`.
<!-- REMOVE UNTIL HERE --> <!-- REMOVE UNTIL HERE -->
<!-- docker-official-library:on -->
### Compatibility ### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3. - The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.

View File

@ -522,4 +522,3 @@ sidebar_label: 2024
* Backported in [#68518](https://github.com/ClickHouse/ClickHouse/issues/68518): Minor update in Dynamic/JSON serializations. [#68459](https://github.com/ClickHouse/ClickHouse/pull/68459) ([Kruglov Pavel](https://github.com/Avogar)). * Backported in [#68518](https://github.com/ClickHouse/ClickHouse/issues/68518): Minor update in Dynamic/JSON serializations. [#68459](https://github.com/ClickHouse/ClickHouse/pull/68459) ([Kruglov Pavel](https://github.com/Avogar)).
* Backported in [#68558](https://github.com/ClickHouse/ClickHouse/issues/68558): CI: Minor release workflow fix. [#68536](https://github.com/ClickHouse/ClickHouse/pull/68536) ([Max K.](https://github.com/maxknv)). * Backported in [#68558](https://github.com/ClickHouse/ClickHouse/issues/68558): CI: Minor release workflow fix. [#68536](https://github.com/ClickHouse/ClickHouse/pull/68536) ([Max K.](https://github.com/maxknv)).
* Backported in [#68576](https://github.com/ClickHouse/ClickHouse/issues/68576): CI: Tidy build timeout from 2h to 3h. [#68567](https://github.com/ClickHouse/ClickHouse/pull/68567) ([Max K.](https://github.com/maxknv)). * Backported in [#68576](https://github.com/ClickHouse/ClickHouse/issues/68576): CI: Tidy build timeout from 2h to 3h. [#68567](https://github.com/ClickHouse/ClickHouse/pull/68567) ([Max K.](https://github.com/maxknv)).

View File

@ -497,4 +497,3 @@ sidebar_label: 2024
* Backported in [#69899](https://github.com/ClickHouse/ClickHouse/issues/69899): Revert "Merge pull request [#69032](https://github.com/ClickHouse/ClickHouse/issues/69032) from alexon1234/include_real_time_execution_in_http_header". [#69885](https://github.com/ClickHouse/ClickHouse/pull/69885) ([Alexey Milovidov](https://github.com/alexey-milovidov)). * Backported in [#69899](https://github.com/ClickHouse/ClickHouse/issues/69899): Revert "Merge pull request [#69032](https://github.com/ClickHouse/ClickHouse/issues/69032) from alexon1234/include_real_time_execution_in_http_header". [#69885](https://github.com/ClickHouse/ClickHouse/pull/69885) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Backported in [#69931](https://github.com/ClickHouse/ClickHouse/issues/69931): RIPE is an acronym and thus should be capital. RIPE stands for **R**ACE **I**ntegrity **P**rimitives **E**valuation and RACE stands for **R**esearch and Development in **A**dvanced **C**ommunications **T**echnologies in **E**urope. [#69901](https://github.com/ClickHouse/ClickHouse/pull/69901) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). * Backported in [#69931](https://github.com/ClickHouse/ClickHouse/issues/69931): RIPE is an acronym and thus should be capital. RIPE stands for **R**ACE **I**ntegrity **P**rimitives **E**valuation and RACE stands for **R**esearch and Development in **A**dvanced **C**ommunications **T**echnologies in **E**urope. [#69901](https://github.com/ClickHouse/ClickHouse/pull/69901) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Backported in [#70034](https://github.com/ClickHouse/ClickHouse/issues/70034): Revert "Add RIPEMD160 function". [#70005](https://github.com/ClickHouse/ClickHouse/pull/70005) ([Robert Schulze](https://github.com/rschu1ze)). * Backported in [#70034](https://github.com/ClickHouse/ClickHouse/issues/70034): Revert "Add RIPEMD160 function". [#70005](https://github.com/ClickHouse/ClickHouse/pull/70005) ([Robert Schulze](https://github.com/rschu1ze)).

View File

@ -936,4 +936,4 @@ SELECT mapPartialReverseSort((k, v) -> v, 2, map('k1', 3, 'k2', 1, 'k3', 2));
┌─mapPartialReverseSort(lambda(tuple(k, v), v), 2, map('k1', 3, 'k2', 1, 'k3', 2))─┐ ┌─mapPartialReverseSort(lambda(tuple(k, v), v), 2, map('k1', 3, 'k2', 1, 'k3', 2))─┐
│ {'k1':3,'k3':2,'k2':1} │ │ {'k1':3,'k3':2,'k2':1} │
└──────────────────────────────────────────────────────────────────────────────────┘ └──────────────────────────────────────────────────────────────────────────────────┘
``` ```

View File

@ -476,7 +476,7 @@
<input id="edit" type="button" value="✎" style="display: none;"> <input id="edit" type="button" value="✎" style="display: none;">
<input id="add" type="button" value="Add chart" style="display: none;"> <input id="add" type="button" value="Add chart" style="display: none;">
<input id="reload" type="button" value="Reload"> <input id="reload" type="button" value="Reload">
<span id="search-span" class="nowrap" style="display: none;"><input id="search" type="button" value="🔎" title="Run query to obtain list of charts from ClickHouse"><input id="search-query" name="search" type="text" spellcheck="false"></span> <span id="search-span" class="nowrap" style="display: none;"><input id="search" type="button" value="🔎" title="Run query to obtain list of charts from ClickHouse. Either select dashboard name or write your own query"><input id="search-query" name="search" list="search-options" type="text" spellcheck="false"><datalist id="search-options"></datalist></span>
<div id="chart-params"></div> <div id="chart-params"></div>
</div> </div>
</form> </form>
@ -532,9 +532,15 @@ const errorMessages = [
} }
] ]
/// Dashboard selector
const dashboardSearchQuery = (dashboard_name) => `SELECT title, query FROM system.dashboards WHERE dashboard = '${dashboard_name}'`;
let dashboard_queries = {
"Overview": dashboardSearchQuery("Overview"),
};
const default_dashboard = 'Overview';
/// Query to fill `queries` list for the dashboard /// Query to fill `queries` list for the dashboard
let search_query = `SELECT title, query FROM system.dashboards WHERE dashboard = 'Overview'`; let search_query = dashboardSearchQuery(default_dashboard);
let customized = false; let customized = false;
let queries = []; let queries = [];
@ -1439,7 +1445,7 @@ async function reloadAll(do_search) {
try { try {
updateParams(); updateParams();
if (do_search) { if (do_search) {
search_query = document.getElementById('search-query').value; search_query = toSearchQuery(document.getElementById('search-query').value);
queries = []; queries = [];
refreshCustomized(false); refreshCustomized(false);
} }
@ -1504,7 +1510,7 @@ function updateFromState() {
document.getElementById('url').value = host; document.getElementById('url').value = host;
document.getElementById('user').value = user; document.getElementById('user').value = user;
document.getElementById('password').value = password; document.getElementById('password').value = password;
document.getElementById('search-query').value = search_query; document.getElementById('search-query').value = fromSearchQuery(search_query);
refreshCustomized(); refreshCustomized();
} }
@ -1543,6 +1549,44 @@ if (window.location.hash) {
} catch {} } catch {}
} }
function fromSearchQuery(query) {
for (const dashboard_name in dashboard_queries) {
if (query == dashboard_queries[dashboard_name])
return dashboard_name;
}
return query;
}
function toSearchQuery(value) {
if (value in dashboard_queries)
return dashboard_queries[value];
else
return value;
}
async function populateSearchOptions() {
let {reply, error} = await doFetch("SELECT dashboard FROM system.dashboards GROUP BY dashboard ORDER BY ALL");
if (error) {
throw new Error(error);
}
let data = reply.data;
if (data.dashboard.length == 0) {
console.log("Unable to fetch dashboards list");
return;
}
dashboard_queries = {};
for (let i = 0; i < data.dashboard.length; i++) {
const dashboard = data.dashboard[i];
dashboard_queries[dashboard] = dashboardSearchQuery(dashboard);
}
const searchOptions = document.getElementById('search-options');
for (const dashboard in dashboard_queries) {
const opt = document.createElement('option');
opt.value = dashboard;
searchOptions.appendChild(opt);
}
}
async function start() { async function start() {
try { try {
updateFromState(); updateFromState();
@ -1558,6 +1602,7 @@ async function start() {
} else { } else {
drawAll(); drawAll();
} }
await populateSearchOptions();
} catch (e) { } catch (e) {
showError(e.message); showError(e.message);
} }

View File

@ -528,7 +528,7 @@ QueryTreeNodePtr IdentifierResolver::tryResolveIdentifierFromCompoundExpression(
* *
* Resolve strategy: * Resolve strategy:
* 1. Try to bind identifier to scope argument name to node map. * 1. Try to bind identifier to scope argument name to node map.
* 2. If identifier is binded but expression context and node type are incompatible return nullptr. * 2. If identifier is bound but expression context and node type are incompatible return nullptr.
* *
* It is important to support edge cases, where we lookup for table or function node, but argument has same name. * It is important to support edge cases, where we lookup for table or function node, but argument has same name.
* Example: WITH (x -> x + 1) AS func, (func -> func(1) + func) AS lambda SELECT lambda(1); * Example: WITH (x -> x + 1) AS func, (func -> func(1) + func) AS lambda SELECT lambda(1);

View File

@ -362,7 +362,7 @@ ReplxxLineReader::ReplxxLineReader(
if (highlighter) if (highlighter)
rx.set_highlighter_callback(highlighter); rx.set_highlighter_callback(highlighter);
/// By default C-p/C-n binded to COMPLETE_NEXT/COMPLETE_PREV, /// By default C-p/C-n bound to COMPLETE_NEXT/COMPLETE_PREV,
/// bind C-p/C-n to history-previous/history-next like readline. /// bind C-p/C-n to history-previous/history-next like readline.
rx.bind_key(Replxx::KEY::control('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_NEXT, code); }); rx.bind_key(Replxx::KEY::control('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_NEXT, code); });
rx.bind_key(Replxx::KEY::control('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_PREVIOUS, code); }); rx.bind_key(Replxx::KEY::control('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::HISTORY_PREVIOUS, code); });
@ -384,9 +384,9 @@ ReplxxLineReader::ReplxxLineReader(
rx.bind_key(Replxx::KEY::control('J'), commit_action); rx.bind_key(Replxx::KEY::control('J'), commit_action);
rx.bind_key(Replxx::KEY::ENTER, commit_action); rx.bind_key(Replxx::KEY::ENTER, commit_action);
/// By default COMPLETE_NEXT/COMPLETE_PREV was binded to C-p/C-n, re-bind /// By default COMPLETE_NEXT/COMPLETE_PREV was bound to C-p/C-n, re-bind
/// to M-P/M-N (that was used for HISTORY_COMMON_PREFIX_SEARCH before, but /// to M-P/M-N (that was used for HISTORY_COMMON_PREFIX_SEARCH before, but
/// it also binded to M-p/M-n). /// it also bound to M-p/M-n).
rx.bind_key(Replxx::KEY::meta('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_NEXT, code); }); rx.bind_key(Replxx::KEY::meta('N'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_NEXT, code); });
rx.bind_key(Replxx::KEY::meta('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_PREVIOUS, code); }); rx.bind_key(Replxx::KEY::meta('P'), [this](char32_t code) { return rx.invoke(Replxx::ACTION::COMPLETE_PREVIOUS, code); });
/// By default M-BACKSPACE is KILL_TO_WHITESPACE_ON_LEFT, while in readline it is backward-kill-word /// By default M-BACKSPACE is KILL_TO_WHITESPACE_ON_LEFT, while in readline it is backward-kill-word

View File

@ -237,6 +237,7 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
null_modifier.emplace(true); null_modifier.emplace(true);
} }
bool is_comment = false;
/// Collate is also allowed after NULL/NOT NULL /// Collate is also allowed after NULL/NOT NULL
if (!collation_expression && s_collate.ignore(pos, expected) if (!collation_expression && s_collate.ignore(pos, expected)
&& !collation_parser.parse(pos, collation_expression, expected)) && !collation_parser.parse(pos, collation_expression, expected))
@ -254,7 +255,9 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
else if (s_ephemeral.ignore(pos, expected)) else if (s_ephemeral.ignore(pos, expected))
{ {
default_specifier = s_ephemeral.getName(); default_specifier = s_ephemeral.getName();
if (!expr_parser.parse(pos, default_expression, expected) && type) if (s_comment.ignore(pos, expected))
is_comment = true;
if ((is_comment || !expr_parser.parse(pos, default_expression, expected)) && type)
{ {
ephemeral_default = true; ephemeral_default = true;
@ -289,19 +292,22 @@ bool IParserColumnDeclaration<NameParser>::parseImpl(Pos & pos, ASTPtr & node, E
if (require_type && !type && !default_expression) if (require_type && !type && !default_expression)
return false; /// reject column name without type return false; /// reject column name without type
if ((type || default_expression) && allow_null_modifiers && !null_modifier.has_value()) if (!is_comment)
{ {
if (s_not.ignore(pos, expected)) if ((type || default_expression) && allow_null_modifiers && !null_modifier.has_value())
{ {
if (!s_null.ignore(pos, expected)) if (s_not.ignore(pos, expected))
return false; {
null_modifier.emplace(false); if (!s_null.ignore(pos, expected))
return false;
null_modifier.emplace(false);
}
else if (s_null.ignore(pos, expected))
null_modifier.emplace(true);
} }
else if (s_null.ignore(pos, expected))
null_modifier.emplace(true);
} }
if (s_comment.ignore(pos, expected)) if (is_comment || s_comment.ignore(pos, expected))
{ {
/// should be followed by a string literal /// should be followed by a string literal
if (!string_literal_parser.parse(pos, comment_expression, expected)) if (!string_literal_parser.parse(pos, comment_expression, expected))

View File

@ -48,6 +48,22 @@ public:
consume(bytes); consume(bytes);
} }
template <typename TValue, typename ParquetType>
void ALWAYS_INLINE readValuesOfDifferentSize(TValue * dst, size_t count)
{
auto necessary_bytes = count * sizeof(ParquetType);
checkAvaible(necessary_bytes);
const ParquetType* src = reinterpret_cast<const ParquetType*>(data);
for (std::size_t i = 0; i < count; i++)
{
dst[i] = static_cast<TValue>(src[i]);
}
consume(necessary_bytes);
}
void ALWAYS_INLINE readDateTime64FromInt96(DateTime64 & dst) void ALWAYS_INLINE readDateTime64FromInt96(DateTime64 & dst)
{ {
static const int max_scale_num = 9; static const int max_scale_num = 9;

View File

@ -240,8 +240,8 @@ TValue * getResizedPrimitiveData(TColumn & column, size_t size)
} // anoynomous namespace } // anoynomous namespace
template <> template <typename TColumn>
void ParquetPlainValuesReader<ColumnString>::readBatch( void ParquetPlainByteArrayValuesReader<TColumn>::readBatch(
MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values)
{ {
auto & column = *assert_cast<ColumnString *>(col_ptr.get()); auto & column = *assert_cast<ColumnString *>(col_ptr.get());
@ -322,8 +322,8 @@ void ParquetBitPlainReader<TColumn>::readBatch(
} }
template <> template <typename TColumn>
void ParquetPlainValuesReader<ColumnDecimal<DateTime64>, ParquetReaderTypes::TimestampInt96>::readBatch( void ParquetPlainInt96ValuesReader<TColumn>::readBatch(
MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values)
{ {
auto cursor = col_ptr->size(); auto cursor = col_ptr->size();
@ -350,8 +350,8 @@ void ParquetPlainValuesReader<ColumnDecimal<DateTime64>, ParquetReaderTypes::Tim
); );
} }
template <typename TColumn, ParquetReaderTypes reader_type> template <typename TColumn, typename ParquetType>
void ParquetPlainValuesReader<TColumn, reader_type>::readBatch( void ParquetPlainValuesReader<TColumn, ParquetType>::readBatch(
MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values)
{ {
auto cursor = col_ptr->size(); auto cursor = col_ptr->size();
@ -365,11 +365,11 @@ void ParquetPlainValuesReader<TColumn, reader_type>::readBatch(
null_map, null_map,
/* individual_visitor */ [&](size_t nest_cursor) /* individual_visitor */ [&](size_t nest_cursor)
{ {
plain_data_buffer.readValue(column_data[nest_cursor]); plain_data_buffer.readValuesOfDifferentSize<TValue, ParquetType>(column_data + nest_cursor, 1);
}, },
/* repeated_visitor */ [&](size_t nest_cursor, UInt32 count) /* repeated_visitor */ [&](size_t nest_cursor, UInt32 count)
{ {
plain_data_buffer.readBytes(column_data + nest_cursor, count * sizeof(TValue)); plain_data_buffer.readValuesOfDifferentSize<TValue, ParquetType>(column_data + nest_cursor, count);
} }
); );
} }
@ -576,18 +576,19 @@ void ParquetRleDictReader<TColumnVector>::readBatch(
} }
template class ParquetPlainValuesReader<ColumnInt32>; template class ParquetPlainValuesReader<ColumnUInt8, int32_t>;
template class ParquetPlainValuesReader<ColumnUInt32>; template class ParquetPlainValuesReader<ColumnInt8, int32_t>;
template class ParquetPlainValuesReader<ColumnInt64>; template class ParquetPlainValuesReader<ColumnUInt16, int32_t>;
template class ParquetPlainValuesReader<ColumnUInt64>; template class ParquetPlainValuesReader<ColumnInt16, int32_t>;
template class ParquetPlainValuesReader<ColumnBFloat16>; template class ParquetPlainValuesReader<ColumnUInt32, int32_t>;
template class ParquetPlainValuesReader<ColumnFloat32>; template class ParquetPlainValuesReader<ColumnInt32, int32_t>;
template class ParquetPlainValuesReader<ColumnFloat64>; template class ParquetPlainValuesReader<ColumnUInt64, int64_t>;
template class ParquetPlainValuesReader<ColumnDecimal<Decimal32>>; template class ParquetPlainValuesReader<ColumnInt64, int64_t>;
template class ParquetPlainValuesReader<ColumnDecimal<Decimal64>>; template class ParquetPlainValuesReader<ColumnFloat32, float>;
template class ParquetPlainValuesReader<ColumnDecimal<DateTime64>>; template class ParquetPlainValuesReader<ColumnFloat64, double>;
template class ParquetPlainValuesReader<ColumnString>; template class ParquetPlainValuesReader<ColumnDecimal<Decimal32>, int32_t>;
template class ParquetPlainValuesReader<ColumnUInt8>; template class ParquetPlainValuesReader<ColumnDecimal<Decimal64>, int64_t>;
template class ParquetPlainValuesReader<ColumnDecimal<DateTime64>, int64_t>;
template class ParquetBitPlainReader<ColumnUInt8>; template class ParquetBitPlainReader<ColumnUInt8>;
@ -598,12 +599,10 @@ template class ParquetRleLCReader<ColumnUInt8>;
template class ParquetRleLCReader<ColumnUInt16>; template class ParquetRleLCReader<ColumnUInt16>;
template class ParquetRleLCReader<ColumnUInt32>; template class ParquetRleLCReader<ColumnUInt32>;
template class ParquetRleDictReader<ColumnUInt8>;
template class ParquetRleDictReader<ColumnInt32>; template class ParquetRleDictReader<ColumnInt32>;
template class ParquetRleDictReader<ColumnUInt32>; template class ParquetRleDictReader<ColumnUInt32>;
template class ParquetRleDictReader<ColumnInt64>; template class ParquetRleDictReader<ColumnInt64>;
template class ParquetRleDictReader<ColumnUInt64>; template class ParquetRleDictReader<ColumnUInt64>;
template class ParquetRleDictReader<ColumnBFloat16>;
template class ParquetRleDictReader<ColumnFloat32>; template class ParquetRleDictReader<ColumnFloat32>;
template class ParquetRleDictReader<ColumnFloat64>; template class ParquetRleDictReader<ColumnFloat64>;
template class ParquetRleDictReader<ColumnDecimal<Decimal32>>; template class ParquetRleDictReader<ColumnDecimal<Decimal32>>;
@ -613,4 +612,8 @@ template class ParquetRleDictReader<ColumnDecimal<Decimal256>>;
template class ParquetRleDictReader<ColumnDecimal<DateTime64>>; template class ParquetRleDictReader<ColumnDecimal<DateTime64>>;
template class ParquetRleDictReader<ColumnString>; template class ParquetRleDictReader<ColumnString>;
template class ParquetPlainByteArrayValuesReader<ColumnString>;
template class ParquetPlainInt96ValuesReader<ColumnDecimal<DateTime64>>;
} }

View File

@ -150,7 +150,7 @@ enum class ParquetReaderTypes
/** /**
* The definition level is RLE or BitPacked encoding, while data is read directly * The definition level is RLE or BitPacked encoding, while data is read directly
*/ */
template <typename TColumn, ParquetReaderTypes reader_type = ParquetReaderTypes::Normal> template <typename TColumn, typename ParquetType>
class ParquetPlainValuesReader : public ParquetDataValuesReader class ParquetPlainValuesReader : public ParquetDataValuesReader
{ {
public: public:
@ -172,6 +172,50 @@ private:
ParquetDataBuffer plain_data_buffer; ParquetDataBuffer plain_data_buffer;
}; };
template <typename TColumn>
class ParquetPlainInt96ValuesReader : public ParquetDataValuesReader
{
public:
ParquetPlainInt96ValuesReader(
Int32 max_def_level_,
std::unique_ptr<RleValuesReader> def_level_reader_,
ParquetDataBuffer data_buffer_)
: max_def_level(max_def_level_)
, def_level_reader(std::move(def_level_reader_))
, plain_data_buffer(std::move(data_buffer_))
{}
void readBatch(MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) override;
private:
Int32 max_def_level;
std::unique_ptr<RleValuesReader> def_level_reader;
ParquetDataBuffer plain_data_buffer;
};
template <typename TColumn>
class ParquetPlainByteArrayValuesReader : public ParquetDataValuesReader
{
public:
ParquetPlainByteArrayValuesReader(
Int32 max_def_level_,
std::unique_ptr<RleValuesReader> def_level_reader_,
ParquetDataBuffer data_buffer_)
: max_def_level(max_def_level_)
, def_level_reader(std::move(def_level_reader_))
, plain_data_buffer(std::move(data_buffer_))
{}
void readBatch(MutableColumnPtr & col_ptr, LazyNullMap & null_map, UInt32 num_values) override;
private:
Int32 max_def_level;
std::unique_ptr<RleValuesReader> def_level_reader;
ParquetDataBuffer plain_data_buffer;
};
template <typename TColumn> template <typename TColumn>
class ParquetBitPlainReader : public ParquetDataValuesReader class ParquetBitPlainReader : public ParquetDataValuesReader
{ {

View File

@ -173,13 +173,7 @@ ColumnPtr readDictPage(
} }
template <typename TColumn> template <is_col_over_big_decimal TColumnDecimal, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> createPlainReader(
const parquet::ColumnDescriptor & col_des,
RleValuesReaderPtr def_level_reader,
ParquetDataBuffer buffer);
template <is_col_over_big_decimal TColumnDecimal>
std::unique_ptr<ParquetDataValuesReader> createPlainReader( std::unique_ptr<ParquetDataValuesReader> createPlainReader(
const parquet::ColumnDescriptor & col_des, const parquet::ColumnDescriptor & col_des,
RleValuesReaderPtr def_level_reader, RleValuesReaderPtr def_level_reader,
@ -192,25 +186,62 @@ std::unique_ptr<ParquetDataValuesReader> createPlainReader(
std::move(buffer)); std::move(buffer));
} }
template <typename TColumn>
template <typename TColumn, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> createPlainReader( std::unique_ptr<ParquetDataValuesReader> createPlainReader(
const parquet::ColumnDescriptor & col_des, const parquet::ColumnDescriptor & col_des,
RleValuesReaderPtr def_level_reader, RleValuesReaderPtr def_level_reader,
ParquetDataBuffer buffer) ParquetDataBuffer buffer)
{ {
if (std::is_same_v<TColumn, ColumnDecimal<DateTime64>> && col_des.physical_type() == parquet::Type::INT96) if constexpr (std::is_same_v<TColumn, ColumnDecimal<DateTime64>> && std::is_same_v<ParquetType, ParquetInt96TypeStub>)
return std::make_unique<ParquetPlainValuesReader<TColumn, ParquetReaderTypes::TimestampInt96>>( return std::make_unique<ParquetPlainInt96ValuesReader<TColumn>>(
col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer)); col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer));
return std::make_unique<ParquetPlainValuesReader<TColumn>>(
if constexpr (std::is_same_v<ParquetType, ParquetByteArrayTypeStub>)
{
return std::make_unique<ParquetPlainByteArrayValuesReader<TColumn>>(
col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer));
}
return std::make_unique<ParquetPlainValuesReader<TColumn, ParquetType>>(
col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer)); col_des.max_definition_level(), std::move(def_level_reader), std::move(buffer));
} }
template <typename TColumn, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> createReader(
const parquet::ColumnDescriptor & col_descriptor,
RleValuesReaderPtr def_level_reader,
const uint8_t * buffer,
std::size_t buffer_max_size,
const DataTypePtr & base_data_type)
{
if constexpr (std::is_same_v<ParquetType, bool>)
{
auto bit_reader = std::make_unique<arrow::bit_util::BitReader>(buffer, buffer_max_size);
return std::make_unique<ParquetBitPlainReader<TColumn>>(
col_descriptor.max_definition_level(), std::move(def_level_reader), std::move(bit_reader));
}
else
{
ParquetDataBuffer parquet_buffer = [&]()
{
if constexpr (!std::is_same_v<ColumnDecimal<DateTime64>, TColumn>)
return ParquetDataBuffer(buffer, buffer_max_size);
auto scale = assert_cast<const DataTypeDateTime64 &>(*base_data_type).getScale();
return ParquetDataBuffer(buffer, buffer_max_size, scale);
}();
return createPlainReader<TColumn, ParquetType>(col_descriptor, std::move(def_level_reader), parquet_buffer);
}
}
} // anonymous namespace } // anonymous namespace
template <typename TColumn> template <typename TColumn, typename ParquetType>
ParquetLeafColReader<TColumn>::ParquetLeafColReader( ParquetLeafColReader<TColumn, ParquetType>::ParquetLeafColReader(
const parquet::ColumnDescriptor & col_descriptor_, const parquet::ColumnDescriptor & col_descriptor_,
DataTypePtr base_type_, DataTypePtr base_type_,
std::unique_ptr<parquet::ColumnChunkMetaData> meta_, std::unique_ptr<parquet::ColumnChunkMetaData> meta_,
@ -223,8 +254,8 @@ ParquetLeafColReader<TColumn>::ParquetLeafColReader(
{ {
} }
template <typename TColumn> template <typename TColumn, typename ParquetType>
ColumnWithTypeAndName ParquetLeafColReader<TColumn>::readBatch(UInt64 rows_num, const String & name) ColumnWithTypeAndName ParquetLeafColReader<TColumn, ParquetType>::readBatch(UInt64 rows_num, const String & name)
{ {
reading_rows_num = rows_num; reading_rows_num = rows_num;
auto readPageIfEmpty = [&]() auto readPageIfEmpty = [&]()
@ -251,41 +282,42 @@ ColumnWithTypeAndName ParquetLeafColReader<TColumn>::readBatch(UInt64 rows_num,
return releaseColumn(name); return releaseColumn(name);
} }
template <> template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<ColumnString>::resetColumn(UInt64 rows_num) void ParquetLeafColReader<TColumn, ParquetType>::resetColumn(UInt64 rows_num)
{ {
if (reading_low_cardinality) if constexpr (std::is_same_v<TColumn, ColumnString>)
{ {
assert(dictionary); if (reading_low_cardinality)
visitColStrIndexType(dictionary->size(), [&]<typename TColVec>(TColVec *)
{ {
column = TColVec::create(); assert(dictionary);
}); visitColStrIndexType(dictionary->size(), [&]<typename TColVec>(TColVec *)
{
column = TColVec::create();
});
// only first position is used // only first position is used
null_map = std::make_unique<LazyNullMap>(1); null_map = std::make_unique<LazyNullMap>(1);
column->reserve(rows_num); column->reserve(rows_num);
}
else
{
null_map = std::make_unique<LazyNullMap>(rows_num);
column = ColumnString::create();
reserveColumnStrRows(column, rows_num);
}
} }
else else
{ {
assert(!reading_low_cardinality);
column = base_data_type->createColumn();
column->reserve(rows_num);
null_map = std::make_unique<LazyNullMap>(rows_num); null_map = std::make_unique<LazyNullMap>(rows_num);
column = ColumnString::create();
reserveColumnStrRows(column, rows_num);
} }
} }
template <typename TColumn> template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn>::resetColumn(UInt64 rows_num) void ParquetLeafColReader<TColumn, ParquetType>::degradeDictionary()
{
assert(!reading_low_cardinality);
column = base_data_type->createColumn();
column->reserve(rows_num);
null_map = std::make_unique<LazyNullMap>(rows_num);
}
template <typename TColumn>
void ParquetLeafColReader<TColumn>::degradeDictionary()
{ {
// if last batch read all dictionary indices, then degrade is not needed this time // if last batch read all dictionary indices, then degrade is not needed this time
if (!column) if (!column)
@ -331,8 +363,8 @@ void ParquetLeafColReader<TColumn>::degradeDictionary()
LOG_DEBUG(log, "degraded dictionary to normal column"); LOG_DEBUG(log, "degraded dictionary to normal column");
} }
template <typename TColumn> template <typename TColumn, typename ParquetType>
ColumnWithTypeAndName ParquetLeafColReader<TColumn>::releaseColumn(const String & name) ColumnWithTypeAndName ParquetLeafColReader<TColumn, ParquetType>::releaseColumn(const String & name)
{ {
DataTypePtr data_type = base_data_type; DataTypePtr data_type = base_data_type;
if (reading_low_cardinality) if (reading_low_cardinality)
@ -365,8 +397,8 @@ ColumnWithTypeAndName ParquetLeafColReader<TColumn>::releaseColumn(const String
return res; return res;
} }
template <typename TColumn> template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn>::readPage() void ParquetLeafColReader<TColumn, ParquetType>::readPage()
{ {
// refer to: ColumnReaderImplBase::ReadNewPage in column_reader.cc // refer to: ColumnReaderImplBase::ReadNewPage in column_reader.cc
// this is where decompression happens // this is where decompression happens
@ -408,8 +440,8 @@ void ParquetLeafColReader<TColumn>::readPage()
} }
} }
template <typename TColumn> template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn>::initDataReader( void ParquetLeafColReader<TColumn, ParquetType>::initDataReader(
parquet::Encoding::type enconding_type, parquet::Encoding::type enconding_type,
const uint8_t * buffer, const uint8_t * buffer,
std::size_t max_size, std::size_t max_size,
@ -425,29 +457,8 @@ void ParquetLeafColReader<TColumn>::initDataReader(
degradeDictionary(); degradeDictionary();
} }
if (col_descriptor.physical_type() == parquet::Type::BOOLEAN) data_values_reader = createReader<TColumn, ParquetType>(
{ col_descriptor, std::move(def_level_reader), buffer, max_size, base_data_type);
if constexpr (std::is_same_v<TColumn, ColumnUInt8>)
{
auto bit_reader = std::make_unique<arrow::bit_util::BitReader>(buffer, max_size);
data_values_reader = std::make_unique<ParquetBitPlainReader<ColumnUInt8>>(col_descriptor.max_definition_level(),
std::move(def_level_reader),
std::move(bit_reader));
}
}
else
{
ParquetDataBuffer parquet_buffer = [&]()
{
if constexpr (!std::is_same_v<ColumnDecimal<DateTime64>, TColumn>)
return ParquetDataBuffer(buffer, max_size);
auto scale = assert_cast<const DataTypeDateTime64 &>(*base_data_type).getScale();
return ParquetDataBuffer(buffer, max_size, scale);
}();
data_values_reader = createPlainReader<TColumn>(
col_descriptor, std::move(def_level_reader), std::move(parquet_buffer));
}
break; break;
} }
case parquet::Encoding::RLE_DICTIONARY: case parquet::Encoding::RLE_DICTIONARY:
@ -476,8 +487,8 @@ void ParquetLeafColReader<TColumn>::initDataReader(
} }
} }
template <typename TColumn> template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn>::readPageV1(const parquet::DataPageV1 & page) void ParquetLeafColReader<TColumn, ParquetType>::readPageV1(const parquet::DataPageV1 & page)
{ {
cur_page_values = page.num_values(); cur_page_values = page.num_values();
@ -562,8 +573,8 @@ void ParquetLeafColReader<TColumn>::readPageV1(const parquet::DataPageV1 & page)
* The data buffer is "offset-ed" by rl bytes length and then dl decoder is built using RLE decoder. Since dl bytes length was present in the header, * The data buffer is "offset-ed" by rl bytes length and then dl decoder is built using RLE decoder. Since dl bytes length was present in the header,
* there is no need to read it and apply an offset like in page v1. * there is no need to read it and apply an offset like in page v1.
* */ * */
template <typename TColumn> template <typename TColumn, typename ParquetType>
void ParquetLeafColReader<TColumn>::readPageV2(const parquet::DataPageV2 & page) void ParquetLeafColReader<TColumn, ParquetType>::readPageV2(const parquet::DataPageV2 & page)
{ {
cur_page_values = page.num_values(); cur_page_values = page.num_values();
@ -609,28 +620,32 @@ void ParquetLeafColReader<TColumn>::readPageV2(const parquet::DataPageV2 & page)
initDataReader(page.encoding(), buffer, page.size() - total_levels_length, std::move(def_level_reader)); initDataReader(page.encoding(), buffer, page.size() - total_levels_length, std::move(def_level_reader));
} }
template <typename TColumn> template <typename TColumn, typename ParquetType>
std::unique_ptr<ParquetDataValuesReader> ParquetLeafColReader<TColumn>::createDictReader( std::unique_ptr<ParquetDataValuesReader> ParquetLeafColReader<TColumn, ParquetType>::createDictReader(
std::unique_ptr<RleValuesReader> def_level_reader, std::unique_ptr<RleValuesReader> rle_data_reader) std::unique_ptr<RleValuesReader> def_level_reader, std::unique_ptr<RleValuesReader> rle_data_reader)
{ {
if (reading_low_cardinality && std::same_as<TColumn, ColumnString>) if constexpr (std::is_same_v<TColumn, ColumnUInt8> || std::is_same_v<TColumn, ColumnInt8>
{ || std::is_same_v<TColumn, ColumnUInt16> || std::is_same_v<TColumn, ColumnInt16>)
std::unique_ptr<ParquetDataValuesReader> res;
visitColStrIndexType(dictionary->size(), [&]<typename TCol>(TCol *)
{
res = std::make_unique<ParquetRleLCReader<TCol>>(
col_descriptor.max_definition_level(),
std::move(def_level_reader),
std::move(rle_data_reader));
});
return res;
}
if (col_descriptor.physical_type() == parquet::Type::type::BOOLEAN)
{ {
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Dictionary encoding for booleans is not supported"); throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Dictionary encoding for booleans is not supported");
} }
if (reading_low_cardinality)
{
if constexpr (std::same_as<TColumn, ColumnString>)
{
std::unique_ptr<ParquetDataValuesReader> res;
visitColStrIndexType(dictionary->size(), [&]<typename TCol>(TCol *)
{
res = std::make_unique<ParquetRleLCReader<TCol>>(
col_descriptor.max_definition_level(),
std::move(def_level_reader),
std::move(rle_data_reader));
});
return res;
}
}
return std::make_unique<ParquetRleDictReader<TColumn>>( return std::make_unique<ParquetRleDictReader<TColumn>>(
col_descriptor.max_definition_level(), col_descriptor.max_definition_level(),
std::move(def_level_reader), std::move(def_level_reader),
@ -639,19 +654,23 @@ std::unique_ptr<ParquetDataValuesReader> ParquetLeafColReader<TColumn>::createDi
} }
template class ParquetLeafColReader<ColumnUInt8>; template class ParquetLeafColReader<ColumnUInt8, bool>;
template class ParquetLeafColReader<ColumnInt32>; template class ParquetLeafColReader<ColumnUInt8, int32_t>;
template class ParquetLeafColReader<ColumnUInt32>; template class ParquetLeafColReader<ColumnInt8, int32_t>;
template class ParquetLeafColReader<ColumnInt64>; template class ParquetLeafColReader<ColumnUInt16, int32_t>;
template class ParquetLeafColReader<ColumnUInt64>; template class ParquetLeafColReader<ColumnInt16, int32_t>;
template class ParquetLeafColReader<ColumnBFloat16>; template class ParquetLeafColReader<ColumnUInt32, int32_t>;
template class ParquetLeafColReader<ColumnFloat32>; template class ParquetLeafColReader<ColumnInt32, int32_t>;
template class ParquetLeafColReader<ColumnFloat64>; template class ParquetLeafColReader<ColumnUInt64, int64_t>;
template class ParquetLeafColReader<ColumnString>; template class ParquetLeafColReader<ColumnInt64, int64_t>;
template class ParquetLeafColReader<ColumnDecimal<Decimal32>>; template class ParquetLeafColReader<ColumnFloat32, float>;
template class ParquetLeafColReader<ColumnDecimal<Decimal64>>; template class ParquetLeafColReader<ColumnFloat64, double>;
template class ParquetLeafColReader<ColumnDecimal<Decimal128>>; template class ParquetLeafColReader<ColumnString, ParquetByteArrayTypeStub>;
template class ParquetLeafColReader<ColumnDecimal<Decimal256>>; template class ParquetLeafColReader<ColumnDecimal<Decimal32>, int32_t>;
template class ParquetLeafColReader<ColumnDecimal<DateTime64>>; template class ParquetLeafColReader<ColumnDecimal<Decimal64>, int64_t>;
template class ParquetLeafColReader<ColumnDecimal<Decimal128>, ParquetByteArrayTypeStub>;
template class ParquetLeafColReader<ColumnDecimal<Decimal256>, ParquetByteArrayTypeStub>;
template class ParquetLeafColReader<ColumnDecimal<DateTime64>, ParquetInt96TypeStub>;
template class ParquetLeafColReader<ColumnDecimal<DateTime64>, int64_t>;
} }

View File

@ -17,7 +17,10 @@ class ColumnDescriptor;
namespace DB namespace DB
{ {
template <typename TColumn> struct ParquetByteArrayTypeStub {};
struct ParquetInt96TypeStub {};
template <typename TColumn, typename ParquetType>
class ParquetLeafColReader : public ParquetColumnReader class ParquetLeafColReader : public ParquetColumnReader
{ {
public: public:

View File

@ -93,19 +93,20 @@ private:
std::unique_ptr<ParquetColumnReader> fromInt32INT(const parquet::IntLogicalType & int_type); std::unique_ptr<ParquetColumnReader> fromInt32INT(const parquet::IntLogicalType & int_type);
std::unique_ptr<ParquetColumnReader> fromInt64INT(const parquet::IntLogicalType & int_type); std::unique_ptr<ParquetColumnReader> fromInt64INT(const parquet::IntLogicalType & int_type);
template<class DataType> template<class ClickHouseType, typename ParquetType>
auto makeLeafReader() auto makeLeafReader()
{ {
return std::make_unique<ParquetLeafColReader<typename DataType::ColumnType>>( return std::make_unique<ParquetLeafColReader<typename ClickHouseType::ColumnType, ParquetType>>(
col_descriptor, std::make_shared<DataType>(), std::move(meta), std::move(page_reader)); col_descriptor, std::make_shared<ClickHouseType>(), std::move(meta), std::move(page_reader));
} }
template<class DecimalType> template<class DecimalType, typename ParquetType>
auto makeDecimalLeafReader() auto makeDecimalLeafReader()
{ {
auto data_type = std::make_shared<DataTypeDecimal<DecimalType>>( auto data_type = std::make_shared<DataTypeDecimal<DecimalType>>(
col_descriptor.type_precision(), col_descriptor.type_scale()); col_descriptor.type_precision(), col_descriptor.type_scale());
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DecimalType>>>(
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DecimalType>, ParquetType>>(
col_descriptor, std::move(data_type), std::move(meta), std::move(page_reader)); col_descriptor, std::move(data_type), std::move(meta), std::move(page_reader));
} }
@ -157,11 +158,11 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt32()
case parquet::LogicalType::Type::INT: case parquet::LogicalType::Type::INT:
return fromInt32INT(dynamic_cast<const parquet::IntLogicalType &>(*col_descriptor.logical_type())); return fromInt32INT(dynamic_cast<const parquet::IntLogicalType &>(*col_descriptor.logical_type()));
case parquet::LogicalType::Type::NONE: case parquet::LogicalType::Type::NONE:
return makeLeafReader<DataTypeInt32>(); return makeLeafReader<DataTypeInt32, int32_t>();
case parquet::LogicalType::Type::DATE: case parquet::LogicalType::Type::DATE:
return makeLeafReader<DataTypeDate32>(); return makeLeafReader<DataTypeDate32, int32_t>();
case parquet::LogicalType::Type::DECIMAL: case parquet::LogicalType::Type::DECIMAL:
return makeDecimalLeafReader<Decimal32>(); return makeDecimalLeafReader<Decimal32, int32_t>();
default: default:
return throwUnsupported(); return throwUnsupported();
} }
@ -174,16 +175,16 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt64()
case parquet::LogicalType::Type::INT: case parquet::LogicalType::Type::INT:
return fromInt64INT(dynamic_cast<const parquet::IntLogicalType &>(*col_descriptor.logical_type())); return fromInt64INT(dynamic_cast<const parquet::IntLogicalType &>(*col_descriptor.logical_type()));
case parquet::LogicalType::Type::NONE: case parquet::LogicalType::Type::NONE:
return makeLeafReader<DataTypeInt64>(); return makeLeafReader<DataTypeInt64, int64_t>();
case parquet::LogicalType::Type::TIMESTAMP: case parquet::LogicalType::Type::TIMESTAMP:
{ {
const auto & tm_type = dynamic_cast<const parquet::TimestampLogicalType &>(*col_descriptor.logical_type()); const auto & tm_type = dynamic_cast<const parquet::TimestampLogicalType &>(*col_descriptor.logical_type());
auto read_type = std::make_shared<DataTypeDateTime64>(getScaleFromLogicalTimestamp(tm_type.time_unit())); auto read_type = std::make_shared<DataTypeDateTime64>(getScaleFromLogicalTimestamp(tm_type.time_unit()));
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>>>( return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>, int64_t>>(
col_descriptor, std::move(read_type), std::move(meta), std::move(page_reader)); col_descriptor, std::move(read_type), std::move(meta), std::move(page_reader));
} }
case parquet::LogicalType::Type::DECIMAL: case parquet::LogicalType::Type::DECIMAL:
return makeDecimalLeafReader<Decimal64>(); return makeDecimalLeafReader<Decimal64, int64_t>();
default: default:
return throwUnsupported(); return throwUnsupported();
} }
@ -195,7 +196,7 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromByteArray()
{ {
case parquet::LogicalType::Type::STRING: case parquet::LogicalType::Type::STRING:
case parquet::LogicalType::Type::NONE: case parquet::LogicalType::Type::NONE:
return makeLeafReader<DataTypeString>(); return makeLeafReader<DataTypeString, ParquetByteArrayTypeStub>();
default: default:
return throwUnsupported(); return throwUnsupported();
} }
@ -210,9 +211,9 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromFLBA()
if (col_descriptor.type_length() > 0) if (col_descriptor.type_length() > 0)
{ {
if (col_descriptor.type_length() <= static_cast<int>(sizeof(Decimal128))) if (col_descriptor.type_length() <= static_cast<int>(sizeof(Decimal128)))
return makeDecimalLeafReader<Decimal128>(); return makeDecimalLeafReader<Decimal128, ParquetByteArrayTypeStub>();
if (col_descriptor.type_length() <= static_cast<int>(sizeof(Decimal256))) if (col_descriptor.type_length() <= static_cast<int>(sizeof(Decimal256)))
return makeDecimalLeafReader<Decimal256>(); return makeDecimalLeafReader<Decimal256, ParquetByteArrayTypeStub>();
} }
return throwUnsupported(PreformattedMessage::create( return throwUnsupported(PreformattedMessage::create(
@ -227,11 +228,23 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt32INT(const parque
{ {
switch (int_type.bit_width()) switch (int_type.bit_width())
{ {
case 8:
{
if (int_type.is_signed())
return makeLeafReader<DataTypeInt8, int32_t>();
return makeLeafReader<DataTypeUInt8, int32_t>();
}
case 16:
{
if (int_type.is_signed())
return makeLeafReader<DataTypeInt16, int32_t>();
return makeLeafReader<DataTypeUInt16, int32_t>();
}
case 32: case 32:
{ {
if (int_type.is_signed()) if (int_type.is_signed())
return makeLeafReader<DataTypeInt32>(); return makeLeafReader<DataTypeInt32, int32_t>();
return makeLeafReader<DataTypeUInt32>(); return makeLeafReader<DataTypeUInt32, int32_t>();
} }
default: default:
return throwUnsupported(PreformattedMessage::create(", bit width: {}", int_type.bit_width())); return throwUnsupported(PreformattedMessage::create(", bit width: {}", int_type.bit_width()));
@ -245,8 +258,8 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::fromInt64INT(const parque
case 64: case 64:
{ {
if (int_type.is_signed()) if (int_type.is_signed())
return makeLeafReader<DataTypeInt64>(); return makeLeafReader<DataTypeInt64, int64_t>();
return makeLeafReader<DataTypeUInt64>(); return makeLeafReader<DataTypeUInt64, int64_t>();
} }
default: default:
return throwUnsupported(PreformattedMessage::create(", bit width: {}", int_type.bit_width())); return throwUnsupported(PreformattedMessage::create(", bit width: {}", int_type.bit_width()));
@ -263,7 +276,7 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::makeReader()
switch (col_descriptor.physical_type()) switch (col_descriptor.physical_type())
{ {
case parquet::Type::BOOLEAN: case parquet::Type::BOOLEAN:
return makeLeafReader<DataTypeUInt8>(); return makeLeafReader<DataTypeUInt8, bool>();
case parquet::Type::INT32: case parquet::Type::INT32:
return fromInt32(); return fromInt32();
case parquet::Type::INT64: case parquet::Type::INT64:
@ -276,13 +289,13 @@ std::unique_ptr<ParquetColumnReader> ColReaderFactory::makeReader()
auto scale = getScaleFromArrowTimeUnit(arrow_properties.coerce_int96_timestamp_unit()); auto scale = getScaleFromArrowTimeUnit(arrow_properties.coerce_int96_timestamp_unit());
read_type = std::make_shared<DataTypeDateTime64>(scale); read_type = std::make_shared<DataTypeDateTime64>(scale);
} }
return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>>>( return std::make_unique<ParquetLeafColReader<ColumnDecimal<DateTime64>, ParquetInt96TypeStub>>(
col_descriptor, read_type, std::move(meta), std::move(page_reader)); col_descriptor, read_type, std::move(meta), std::move(page_reader));
} }
case parquet::Type::FLOAT: case parquet::Type::FLOAT:
return makeLeafReader<DataTypeFloat32>(); return makeLeafReader<DataTypeFloat32, float>();
case parquet::Type::DOUBLE: case parquet::Type::DOUBLE:
return makeLeafReader<DataTypeFloat64>(); return makeLeafReader<DataTypeFloat64, double>();
case parquet::Type::BYTE_ARRAY: case parquet::Type::BYTE_ARRAY:
return fromByteArray(); return fromByteArray();
case parquet::Type::FIXED_LEN_BYTE_ARRAY: case parquet::Type::FIXED_LEN_BYTE_ARRAY:

View File

@ -299,8 +299,6 @@ class TagAttrs:
# Only one latest can exist # Only one latest can exist
latest: ClickHouseVersion latest: ClickHouseVersion
# Only one can be a major one (the most fresh per a year)
majors: Dict[int, ClickHouseVersion]
# Only one lts version can exist # Only one lts version can exist
lts: Optional[ClickHouseVersion] lts: Optional[ClickHouseVersion]
@ -345,14 +343,6 @@ def ldf_tags(version: ClickHouseVersion, distro: str, tag_attrs: TagAttrs) -> st
tags.append("lts") tags.append("lts")
tags.append(f"lts-{distro}") tags.append(f"lts-{distro}")
# If the tag `22`, `23`, `24` etc. should be included in the tags
with_major = tag_attrs.majors.get(version.major) in (None, version)
if with_major:
tag_attrs.majors[version.major] = version
if without_distro:
tags.append(f"{version.major}")
tags.append(f"{version.major}-{distro}")
# Add all normal tags # Add all normal tags
for tag in ( for tag in (
f"{version.major}.{version.minor}", f"{version.major}.{version.minor}",
@ -384,7 +374,7 @@ def generate_ldf(args: argparse.Namespace) -> None:
args.directory / git_runner(f"git -C {args.directory} rev-parse --show-cdup") args.directory / git_runner(f"git -C {args.directory} rev-parse --show-cdup")
).absolute() ).absolute()
lines = ldf_header(git, directory) lines = ldf_header(git, directory)
tag_attrs = TagAttrs(versions[-1], {}, None) tag_attrs = TagAttrs(versions[-1], None)
# We iterate from the most recent to the oldest version # We iterate from the most recent to the oldest version
for version in reversed(versions): for version in reversed(versions):

View File

@ -378,7 +378,7 @@ def test_reload_via_client(cluster, zk):
configure_from_zk(zk) configure_from_zk(zk)
break break
except QueryRuntimeException: except QueryRuntimeException:
logging.exception("The new socket is not binded yet") logging.exception("The new socket is not bound yet")
time.sleep(0.1) time.sleep(0.1)
if exception: if exception:

View File

@ -0,0 +1,11 @@
drop table if exists test;
CREATE TABLE test (
`start_s` UInt32 EPHEMERAL COMMENT 'start UNIX time' ,
`start_us` UInt16 EPHEMERAL COMMENT 'start microseconds',
`finish_s` UInt32 EPHEMERAL COMMENT 'finish UNIX time',
`finish_us` UInt16 EPHEMERAL COMMENT 'finish microseconds',
`captured` DateTime MATERIALIZED fromUnixTimestamp(start_s),
`duration` Decimal32(6) MATERIALIZED finish_s - start_s + (finish_us - start_us)/1000000
)
ENGINE Null;
drop table if exists test;

View File

@ -0,0 +1,40 @@
-94 53304 17815465730223871
57 15888 33652524900575246
-4 14877 53832092832965652
33 3387 86326601511136103
104 3383 115438187156564782
-11 37403 145056169255259589
-72 46473 159324626361233509
103 35510 173644182696185097
-26 60902 185175917734318892
70 48767 193167023342307884
2 21648 247953090704786001
20 2986 268127160817221407
76 20277 290178827409195337
61 28692 305149163504092270
-74 65427 326871531363668398
-15 20256 351812901947846888
-39 65472 357371822264135234
79 38671 371605113770958364
-29 41706 394460710549666968
92 25026 412913269933311543
-94 53304 17815465730223871
57 15888 33652524900575246
-4 14877 53832092832965652
33 3387 86326601511136103
104 3383 115438187156564782
-11 37403 145056169255259589
-72 46473 159324626361233509
103 35510 173644182696185097
-26 60902 185175917734318892
70 48767 193167023342307884
2 21648 247953090704786001
20 2986 268127160817221407
76 20277 290178827409195337
61 28692 305149163504092270
-74 65427 326871531363668398
-15 20256 351812901947846888
-39 65472 357371822264135234
79 38671 371605113770958364
-29 41706 394460710549666968
92 25026 412913269933311543

View File

@ -0,0 +1,22 @@
#!/usr/bin/env bash
# Tags: no-ubsan, no-fasttest
CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
. "$CUR_DIR"/../shell_config.sh
USER_FILES_PATH=$($CLICKHOUSE_CLIENT_BINARY --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}')
WORKING_DIR="${USER_FILES_PATH}/${CLICKHOUSE_TEST_UNIQUE_NAME}"
mkdir -p "${WORKING_DIR}"
DATA_FILE="${CUR_DIR}/data_parquet/multi_column_bf.gz.parquet"
DATA_FILE_USER_PATH="${WORKING_DIR}/multi_column_bf.gz.parquet"
cp ${DATA_FILE} ${DATA_FILE_USER_PATH}
${CLICKHOUSE_CLIENT} --query="select int8_logical, uint16_logical, uint64_logical from file('${DATA_FILE_USER_PATH}', Parquet) order by uint64_logical limit 20 SETTINGS input_format_parquet_use_native_reader=false;";
${CLICKHOUSE_CLIENT} --query="select int8_logical, uint16_logical, uint64_logical from file('${DATA_FILE_USER_PATH}', Parquet) order by uint64_logical limit 20 SETTINGS input_format_parquet_use_native_reader=true;";