avogar
f5f1db86d9
Remove commented code
2022-04-13 19:15:52 +00:00
avogar
8b60aeb7bc
Improve schema inference for json objects
2022-04-13 19:13:40 +00:00
avogar
1c065f8c7a
Some refactoring around schema inference with globs
2022-04-13 17:02:48 +00:00
mergify[bot]
ea3afd4c6c
Merge branch 'master' into musl-check
2022-04-13 12:28:34 +00:00
Nikolai Kochetov
362fcfd2b8
Merge pull request #36075 from ClickHouse/fix-limit-push-down-over-window
...
Disable LIMIT push down through WINDOW functions.
2022-04-13 11:57:37 +02:00
alesapin
2f496c7945
Merge branch 'master' into musl-check
2022-04-12 14:40:47 +02:00
Yakov Olkhovskiy
155a2a0d42
Merge pull request #35349 from yakov-olkhovskiy/interpolate-feature
...
Interpolate feature
2022-04-11 11:15:50 -04:00
Nikolai Kochetov
2deec53162
Disable LIMIT push down through WINDOW functions.
2022-04-08 13:39:54 +00:00
avogar
1c783ed88a
Resolve conflicts
2022-04-07 12:17:48 +00:00
avogar
d2017a63b1
Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference
2022-04-07 11:36:40 +00:00
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
...
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
Kseniia Sumarokova
18a594a22e
Merge pull request #34631 from bigo-sg/use_minmax_index
...
Use minmax index for orc/parquet file in Hive Engine
2022-04-07 12:22:19 +02:00
Nikolai Kochetov
3e1b3f14c0
Merge pull request #34355 from azat/processors-profiling
...
Profiling on Processors level
2022-04-07 12:13:14 +02:00
taiyang-li
2ef316801c
Merge branch 'master' into use_minmax_index
2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers
2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference
2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e
suppoort skip splits in orc and parquet
2022-04-06 16:40:22 +08:00
Yakov Olkhovskiy
90c4cd3de7
Merge branch 'master' into interpolate-feature
2022-04-05 14:39:07 -04:00
Nickita Taranov
0f94a58f3a
use getName()
2022-04-04 14:59:38 +02:00
Nickita Taranov
440e57769a
more fizes
2022-04-04 14:33:58 +02:00
Nickita Taranov
ce40d84eef
more fixes
2022-04-04 14:33:58 +02:00
Nickita Taranov
a39427f00b
clean up
2022-04-04 14:33:57 +02:00
Nickita Taranov
eedcd61479
fix
2022-04-04 14:33:57 +02:00
Nickita Taranov
a08c035443
stash
2022-04-04 14:33:57 +02:00
Nickita Taranov
b095838444
stash
2022-04-04 14:33:57 +02:00
Nickita Taranov
4c51329ad6
stash
2022-04-04 14:33:57 +02:00
Nikita Taranov
bd89fcafdb
Make SortDescription::column_name
always non-empty ( #35805 )
2022-04-04 14:17:15 +02:00
何李夫
09c04e4993
Improve the pipeline description for JOIN ( #35612 )
...
Improve the pipeline description for JOIN
2022-04-04 13:56:41 +02:00
Azat Khuzhin
58ee917e94
Mesure processors profiles only if it was enabled
...
Since it may use little extra CPU.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Azat Khuzhin
99528e296c
Rename need_data_elapsed_us/port_full_elapsed_us to input_wait_us/output_wait_us
...
$ gg -e need_data_ -e port_full_ | cut -d: -f1 | sort -u | xargs sed -i -e s/port_full_/output_wait_/g -e s/need_data_/input_wait_/g -e s/getPortFull/getOutputWait/g -e s/getNeedData/getInputWait/g
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Azat Khuzhin
5fd402eaba
Measure time that Process spent in work/NeedData/PortFull
...
Note, that right now it is done not in IProcessor, but in
ExecutingGraph/ExecutionThreadContext, to avoid lots of changes in the
IProcessor interface, to make review easier, but I'm not against of
change the IProcessor interface to incapsulate it there.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Anton Popov
11e18a16f3
Merge pull request #35724 from Avogar/fix-order
...
Improve schema inference for JSONEachRow and TSKV formats
2022-04-04 11:00:21 +02:00
Alexey Milovidov
5a47958744
Merge pull request #35736 from CurtizJ/quota-written-bytes
...
Add quota for written bytes
2022-04-03 05:26:49 +03:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order
2022-04-02 12:00:29 +00:00
Anton Popov
687942ce70
more strict quota for written bytes
2022-04-01 15:02:49 +00:00
Yakov Olkhovskiy
538373a79b
style fix
2022-03-31 12:13:49 -04:00
Yakov Olkhovskiy
a15996315e
bugfix - columns order tracking
2022-03-31 11:51:13 -04:00
avogar
ab2a963287
Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers
2022-03-31 14:09:43 +00:00
Kruglov Pavel
252d66e80d
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-31 16:08:37 +02:00
mergify[bot]
24ade25d61
Merge branch 'master' into improve-schema-inference
2022-03-31 13:42:47 +00:00
Yakov Olkhovskiy
b5682c1f02
minor refactoring
2022-03-31 08:33:50 -04:00
avogar
836e7dae67
Fix bug in indexes of not presented columns in -WithNames formats
2022-03-31 12:24:40 +00:00
avogar
d272356324
Minor code improvement
2022-03-31 10:55:09 +00:00
avogar
74275da7ee
Make better
2022-03-31 10:52:34 +00:00
Yakov Olkhovskiy
6a1e116c46
refactoring
2022-03-30 16:34:19 -04:00
Antonio Andelic
d85ed8f2a9
Merge pull request #35655 from ClickHouse/exception-compile-time-message-check
...
Use compile-time check for `Exception` messages
2022-03-30 08:11:32 +02:00
Anton Popov
caacc7d385
add quota for written bytes
2022-03-29 18:21:29 +00:00
avogar
000f3043e7
Make better
2022-03-29 17:40:07 +00:00
avogar
3fc36627b3
Allow to infer and parse bools as numbers in JSON input formats
2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9
Improve schema inference for JSONEachRow and TSKV formats
2022-03-29 14:47:51 +00:00
Dmitry Novik
8f935a72d6
Merge pull request #33230 from CurtizJ/read_in_order_max_rows_to_read
...
Proper handle of 'max_rows_to_read' in case of reading in order of sorting key
2022-03-29 15:16:34 +02:00
Antonio Andelic
9990abb76a
Use compile-time check for Exception messages, fix wrong messages
2022-03-29 13:16:11 +00:00
avogar
97f5033ea9
Fix tests
2022-03-29 13:07:37 +00:00
mergify[bot]
343588de2c
Merge branch 'master' into improve-schema-inference
2022-03-29 13:06:00 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
...
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Alexey Milovidov
5e262fba85
Merge pull request #35204 from azat/build-gcc
...
Add build with GCC
2022-03-29 04:55:15 +03:00
Yakov Olkhovskiy
615efa1381
aliases processing fixed
2022-03-28 19:15:53 -04:00
Azat Khuzhin
bf4df5c6bb
Fix SIGSEGV for build under gcc-11 (due to auto deduction)
...
During building with gcc-11 you will got SIGSEGV for building
InterpretersMySQLDDLQuery (and some others), and it is due to endless
recursion:
(gdb) bt 5
0 0x00000000010978f2 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=0) at ../../src/gcc/cp/typeck.c:1274
1 0x00000000011c3f9d in comp_template_parms (parms1=<optimized out>, parms2=<optimized out>) at ../../src/gcc/cp/pt.c:3369
2 0x0000000001097bd9 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
3 0x00000000011c3f9d in comp_template_parms (parms1=<optimized out>, parms2=<optimized out>) at ../../src/gcc/cp/pt.c:3369
4 0x0000000001097bd9 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
(gdb) bt -X
1397454 0x0000000001097bd9 in structural_comptypes (t1=0x7fde028d1540, t2=0x7fde028d27e0, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
1397455 0x0000000000f2d8b5 in cp_tree_equal (t1=<optimized out>, t2=<optimized out>) at ../../src/gcc/cp/tree.c:4144
1397456 0x00000000010909cb in template_args_equal (ot=0x7fde028cf578, nt=0x7fde028cfc58, partial_order=<optimized out>) at ../../src/gcc/cp/pt.c:9256
1397457 0x0000000001090422 in template_args_equal (partial_order=false, nt=0x7fde028cfc58, ot=0x7fde028cf578) at ../../src/gcc/cp/pt.c:9295
1397458 comp_template_args (oldargs=0x7fde028cf550, newargs=0x7fde028cfc30, oldarg_ptr=0x0, newarg_ptr=0x0, partial_order=false) at ../../src/gcc/cp/pt.c:9285
1397459 0x00000000010a08f4 in spec_hasher::equal (e1=0x7fde028c95d0, e2=0x7ffd1194e8c0) at ../../src/gcc/cp/pt.c:1726
1397460 0x0000000001085965 in hash_table<spec_hasher, false, xcallocator>::find_with_hash (this=0x7fde36b7f450, comparable=@0x7ffd1194e8b8: 0x7ffd1194e8c0, hash=<optimized out>) at ../../src/gcc/hash-table.h:936
1397461 0x0000000001079698 in lookup_template_class_1 (d1=<optimized out>, arglist=0x7fde028cfc30, in_decl=0x0, context=<optimized out>, entering_scope=<optimized out>, complain=3) at ../../src/gcc/cp/pt.c:9896
1397462 0x000000000109f8ef in lookup_template_class (complain=3, entering_scope=1, context=0x7fde27558e40, in_decl=0x0, arglist=0x7fde028cfc08, d1=0x7fde269bcd20) at ../../src/gcc/cp/pt.c:10251
1397463 tsubst_aggr_type (t=0x7fde269bcd20, args=<optimized out>, complain=3, in_decl=0x0, entering_scope=1) at ../../src/gcc/cp/pt.c:13646
1397464 0x000000000108f797 in tsubst (t=0x7fde269bcdc8, args=0x7fde028cf7a8, complain=3, in_decl=<optimized out>) at ../../src/gcc/cp/pt.c:16108
1397465 0x0000000000e61bf2 in rewrite_template_parm (level=1, complain=3, tsubst_args=0x7fde028cf7a8, index=5, olddecl=0x7fde269b5600) at ../../src/gcc/cp/pt.c:28556
1397466 rewrite_tparm_list(tree_node*, unsigned int, unsigned int, tree_node*, unsigned int, int) [clone .constprop.0] (oldelt=0x7fde269bd190, index=5, targs=0x7fde028cf7a8, targs_index=4, complain=3, level=1) at ../../src/gcc/cp/pt.c:28640
1397467 0x00000000009f3748 in build_deduction_guide (type=type@entry=0x7fde26e13dc8, ctor=0x7fde269ac300, outer_args=outer_args@entry=0x0, complain=complain@entry=3) at ../../src/gcc/cp/pt.c:28769
1397468 0x00000000009f444f in ctor_deduction_guides_for (complain=3, tmpl=<optimized out>) at ../../src/gcc/cp/cp-tree.h:842
1397469 deduction_guides_for (tmpl=<optimized out>, any_dguides_p=<optimized out>, complain=3) at ../../src/gcc/cp/pt.c:29282
1397470 0x00000000008507a8 in do_class_deduction (complain=3, flags=1, init=<optimized out>, tmpl=0x7fde26e0f980, ptype=0x7fde028c7b28) at ../../src/gcc/cp/pt.c:29402
1397471 do_auto_deduction (type=0x7fde028c7b28, init=<optimized out>, auto_node=<optimized out>, complain=3, context=<optimized out>, outer_targs=<optimized out>, flags=1) at ../../src/gcc/cp/pt.c:29572
1397472 0x00000000007c9569 in finish_compound_literal (type=<optimized out>, compound_literal=0x7fde028c95b8, complain=3, fcl_context=fcl_functional) at ../../src/gcc/cp/semantics.c:3060
1397473 0x0000000001123a79 in cp_parser_functional_cast (parser=0x7fde27558da8, type=0x7fde028c7b28) at ../../src/gcc/cp/parser.c:30670
1397474 0x0000000000fd7873 in cp_parser_postfix_expression (parser=0x7fde27558da8, address_p=<optimized out>, cast_p=<optimized out>, member_access_only_p=<optimized out>, decltype_p=false, pidk_return=0x0) at ../../src/gcc/cp/parser.c:7437
1397475 0x0000000000fd4ddf in cp_parser_binary_expression (parser=0x7fde27558da8, cast_p=<optimized out>, no_toplevel_fold_p=false, decltype_p=<optimized out>, prec=PREC_NOT_OPERATOR, pidk=<optimized out>) at ../../src/gcc/cp/parser.c:9842
1397476 0x0000000000fd4595 in cp_parser_assignment_expression (parser=0x7fde27558da8, pidk=<optimized out>, cast_p=<optimized out>, decltype_p=<optimized out>) at ../../src/gcc/cp/parser.c:10146
1397477 0x0000000000fd3b90 in cp_parser_constant_expression (parser=0x7fde27558da8, allow_non_constant_p=2, non_constant_p=0x7ffd1194f1d7, strict_p=<optimized out>) at ../../src/gcc/cp/parser.c:10449
1397478 0x0000000000fcfdd5 in cp_parser_initializer_clause (non_constant_p=<optimized out>, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:24253
1397479 cp_parser_initializer (parser=0x7fde27558da8, is_direct_init=<optimized out>, non_constant_p=<optimized out>, subexpression_p=<optimized out>) at ../../src/gcc/cp/parser.c:24193
1397480 0x000000000062e5d8 in cp_parser_decomposition_declaration (init_loc=0x7ffd1194f1d8, maybe_range_for_decl=0x7ffd1194f498, decl_specifiers=0x7ffd1194f1f0, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:14734
1397481 cp_parser_simple_declaration (parser=0x7fde27558da8, function_definition_allowed_p=<optimized out>, maybe_range_for_decl=0x7ffd1194f498) at ../../src/gcc/cp/parser.c:14393
1397482 0x000000000109b870 in cp_parser_init_statement (parser=0x7fde27558da8, decl=0x7ffd1194f498) at ../../src/gcc/cp/parser.c:13420
1397483 0x00000000010996f0 in cp_parser_for (unroll=0, ivdep=false, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:12708
1397484 cp_parser_iteration_statement (parser=0x7fde27558da8, if_p=0x0, ivdep=<optimized out>, unroll=<optimized out>) at ../../src/gcc/cp/parser.c:13343
1397485 0x0000000000fe5c46 in cp_parser_statement (parser=0x7fde27558da8, in_statement_expr=0x0, in_compound=<optimized out>, if_p=0x0, chain=0x0, loc_after_labels=0x0) at ../../src/gcc/cp/parser.c:11718
1397486 0x0000000000fe15ac in cp_parser_statement_seq_opt (in_statement_expr=<optimized out>, parser=<optimized out>) at ../../src/gcc/cp/parser.c:12201
1397487 cp_parser_compound_statement (parser=0x7fde27558da8, in_statement_expr=0x0, bcs_flags=<optimized out>, function_body=<optimized out>) at ../../src/gcc/cp/parser.c:12150
Interesting frame is 1397471, from which we can extract location:
(gdb) p line_table[0].info_ordinary.maps[line_table[0].info_ordinary.cache]
$54 = {
<line_map> = {
start_location = 1396581280
},
members of line_map_ordinary:
reason = LC_RENAME,
sysp = 0 '\000',
m_column_and_range_bits = 8,
m_range_bits = 0,
to_file = 0x3eb4bb0 "/ch/src/Interpreters/MySQL/InterpretersMySQLDDLQuery.cpp",
to_line = 46,
included_from = 0
}
By replicating SOURCE_LINE() macro from gcc-11 (libcpp/include/line-map.h):
/* Converts a map and a location_t to source line. */
inline linenum_type
SOURCE_LINE (const line_map_ordinary *ord_map, location_t loc)
{
return ((loc - ord_map->start_location)
>> ord_map->m_column_and_range_bits) + ord_map->to_line;
}
We got line 154:
(gdb) p ((input_location-1396581280) >> 8) + 46
$61 = 154
And this is:
auto [column_name_and_type, declare_column_ast] = std::tuple{columns_name_and_type.begin(), columns_definition->children.begin()};
After rewriting it, everything works correctly.
Also here is a reproducer for gcc-11 (and no failures for gcc-12, but
gcc-12 has other issues, like [1] and one more for hash table):
# cat /tmp/test.cpp
#include <tuple>
auto multi()
{
return std::tuple{1, 1};
}
double foo()
{
auto [a, b] = multi();
return a - b;
}
# g++-11 -std=gnu++20 -c -o /dev/null -isystem /ch/contrib/libcxx/include -nostdinc++ /tmp/test.cpp
g++-11: internal compiler error: Segmentation fault signal terminated program cc1plus
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-11/README.Bugs> for instructions.
# g++-12 -std=gnu++20 -c -o /dev/null -isystem /ch/contrib/libcxx/include -nostdinc++ /tmp/test.cpp
[1]: https://reviews.llvm.org/D122598
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-28 22:21:45 +03:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
...
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
mergify[bot]
f09ebea2d8
Merge branch 'master' into read_in_order_max_rows_to_read
2022-03-28 13:30:34 +00:00
Yakov Olkhovskiy
5a4694f340
major refactoring, simplified, optimized, bugs fixed
2022-03-27 14:32:09 -04:00
Anton Popov
67195bfdd5
support schema inference for type Object in format JSONEachRow
2022-03-25 21:51:53 +00:00
Vladimir C
cfb12aff6f
Merge pull request #35460 from helifu/master
2022-03-25 15:55:17 +01:00
avogar
6fb3c3be04
Fix comments and build
2022-03-25 12:02:21 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference
2022-03-25 12:05:40 +01:00
Vladimir C
ae92963b15
Fix build error in Formats/ISchemaReader.cpp
2022-03-25 11:30:25 +01:00
Yakov Olkhovskiy
adefcfd299
Merge branch 'master' into interpolate-feature
2022-03-24 15:33:09 -04:00
Yakov Olkhovskiy
83f406b722
optimization, INTERPOLATE without expr. list, any column is allowed except WITH FILL
2022-03-24 15:29:29 -04:00
Kruglov Pavel
287e1a6efc
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:52 +01:00
Kruglov Pavel
6a9df9d471
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:47 +01:00
Kruglov Pavel
3b801a4093
Update src/Processors/Formats/ISchemaReader.cpp
...
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:41 +01:00
Anton Popov
78100abc5f
add parallel parsing and schema inference for type Object
2022-03-24 17:51:35 +00:00
avogar
557edbd172
Add some improvements and fixes in schema inference
2022-03-24 12:54:12 +00:00
vdimir
f106e2dd49
fix style in QueryPlan.cpp
2022-03-24 11:53:58 +00:00
vdimir
d16ae46589
remove description for ReadFromMergeTree from pipeline, adjust tests for plan
2022-03-24 11:31:52 +00:00
helifu
8a5bd2defa
Add explicit table info to the scan node of query plan and pipeline
...
:) explain plan select * from table1 t1 left join table2 t2 on t1.name = t2.name;
┌─explain──────────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Join (JOIN) │
│ Expression (Before JOIN) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromMergeTree (default.table1) │
│ Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY)))) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromMergeTree (default.table2) │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
:) explain pipeline select * from table1 t1 left join table2 t2 on t1.name = t2.name;
┌─explain──────────────────────────────────────────┐
│ (Expression) │
│ ExpressionTransform × 24 │
│ (Join) │
│ JoiningTransform × 24 2 → 1 │
│ Resize 1 → 24 │
│ FillingRightJoinSide │
│ Resize 24 → 1 │
│ (Expression) │
│ ExpressionTransform × 24 │
│ (SettingQuotaAndLimits) │
│ (ReadFromMergeTree default.table1) │
│ MergeTreeThread × 24 0 → 1 │
│ (Expression) │
│ ExpressionTransform × 24 │
│ (SettingQuotaAndLimits) │
│ (ReadFromMergeTree default.table2) │
│ MergeTreeThread × 24 0 → 1 │
└──────────────────────────────────────────────────┘
2022-03-24 10:49:12 +00:00
mergify[bot]
bf90edc362
Merge branch 'master' into case-insensitive-column-matching
2022-03-24 08:00:42 +00:00
Kruglov Pavel
826b933b08
Merge pull request #35332 from Avogar/fix-tskv-schema-inference
...
Fix schema inference for TSKV format while using small max_read_buffer_size
2022-03-23 18:37:07 +01:00
Antonio Andelic
052057f2ef
Address PR comments
2022-03-23 15:42:46 +00:00
Anton Popov
f693eba568
fix tests with approx rows
2022-03-22 14:30:40 +00:00
Antonio Andelic
6b6190554b
Fix conversion of arrow to CH column with hint header
2022-03-22 11:15:48 +00:00
Antonio Andelic
0c23cd7b94
Add support for case insensitive column matching in arrow
2022-03-22 10:55:10 +00:00
Antonio Andelic
ca7844e338
Fix tests
2022-03-22 09:27:20 +00:00
Antonio Andelic
6cebb6bc88
Merge branch 'master' into case-insensitive-column-matching
2022-03-22 07:36:35 +00:00
mergify[bot]
e11ef05c2b
Merge branch 'master' into issue_33147
2022-03-21 13:40:17 +00:00
Antonio Andelic
cb3703b46e
Style fix
2022-03-21 12:54:56 +00:00
Antonio Andelic
0457a3998a
remove old test
2022-03-21 11:58:55 +00:00
Kruglov Pavel
1645b7083f
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:12 +01:00
Kruglov Pavel
0b381ebd26
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:06 +01:00
Kruglov Pavel
f67b8c0bad
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:00 +01:00
Kruglov Pavel
ed7b40253c
Merge pull request #35039 from zzsmdfj/issue/#34890_fix_input_format_null_as_default_bug
...
to #34890_fix_input_format_null_as_default_bug
2022-03-21 12:42:17 +01:00
Antonio Andelic
0c74fa2c19
Remove unecessary code
2022-03-21 08:38:15 +00:00
tavplubix
716c6f0ffa
Merge pull request #35406 from Avogar/fix-parquet
...
Fix working with unneeded columns in Arrow/Parquet/ORC formats
2022-03-21 11:36:54 +03:00
Antonio Andelic
29d2bf7d1a
Merge branch 'master' into case-insensitive-column-matching
2022-03-21 08:17:27 +00:00
Antonio Andelic
d73c906e68
Format code
2022-03-21 07:50:17 +00:00
Antonio Andelic
f75b054255
Allow case insensitive column matching
2022-03-21 07:47:37 +00:00
Yakov Olkhovskiy
481ee8aad5
Update FillingTransform.cpp
...
use range-based for loop
2022-03-19 12:17:56 -04:00
Yakov Olkhovskiy
eb7474e73a
Merge branch 'master' into interpolate-feature
2022-03-19 03:11:14 -04:00
Nikita Taranov
7d61fa5f05
impl
2022-03-18 22:58:35 +00:00
Yakov Olkhovskiy
a8e1671a76
type match check for INTERPOLATE expressions added, bugfix, printout fixed
2022-03-18 16:44:27 -04:00
avogar
58f2aca120
Fix tests
2022-03-18 19:04:16 +00:00
avogar
cffa2096de
Fix working with unneeded columns in Arrow/Parquet/ORC formats
2022-03-18 13:07:54 +00:00
Kruglov Pavel
aa3c05e9d4
Merge pull request #35152 from rschu1ze/protobuf-batch-write
...
ProtobufList
2022-03-18 13:24:34 +01:00
mergify[bot]
28734562bb
Merge branch 'master' into issue/#34890_fix_input_format_null_as_default_bug
2022-03-17 13:24:34 +00:00
Antonio Andelic
607f785e48
Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
...
This reverts commit ebf72bf61d
, reversing
changes made to f1b812bdc1
.
2022-03-17 12:31:43 +00:00
Yakov Olkhovskiy
00700988ee
style fix
2022-03-17 02:31:01 -04:00
Yakov Olkhovskiy
7bb66e6702
added INTERPOLATE extension for ORDER BY WITH FILL
2022-03-17 01:51:35 -04:00
Anton Popov
2ced42ed41
add experimental settings for Object type
2022-03-16 16:51:23 +00:00
Anton Popov
0ba78c3c3a
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-16 15:28:09 +00:00
avogar
f7c5fe14e4
Fix schema inference for TSKV format while using small max_read_buffer_size
2022-03-16 13:53:50 +00:00
Robert Schulze
0d2ece6d91
Merge branch 'ClickHouse:master' into protobuf-batch-write
2022-03-16 09:43:33 +01:00
Robert Schulze
23122cb327
Fix review comments
...
ParquetBlockOutputFormat.cpp:
- undo unrelated formatting
ProtobufSerializer.cpp:
- undef debug tracing
- simplify logic in writeRow()
ProtobufSchemas.cpp:
- restore original search in cache by message type
2022-03-15 11:27:17 +01:00
Maksim Kita
2665724301
Fix clang-tidy warnings in Parsers, Processors, QueryPipeline folders
2022-03-14 18:17:35 +00:00
Anton Popov
36ec379aeb
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-14 16:28:35 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
...
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
Maksim Kita
ad6b3693e1
Merge pull request #35123 from zhanghuajieHIT/fix_build_fail_with_gcc
...
fix build fail with gcc
2022-03-14 10:36:15 +01:00
Kseniia Sumarokova
58a2d2b458
Merge pull request #35118 from zzsmdfj/issue/#31469_MaterializedMysql_mysqlDate2CkDate32
...
to #31469_MaterializedMysql_mysqlDate2CkDate32
2022-03-14 10:32:33 +01:00
Robert Schulze
514d4d2187
Implement ProtobufList - fixes ClickHouse#16436
...
Introduce IO format "ProtobufList" with protobuf schema
// schemafile.proto
message Envelope {
message MessageType {
uint32 colA = 1;
string colB = 2;
}
repeated MessageType mt = 1;
}
where "Envelope" is a hard-coded/expected top-level message and
"MessageType" is a message with user-provided name containing the table
fields to export/import, e.g.
SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema =
'schemafile:MessageType'
As a result, the new format wraps a list of messages (one per row) into
a single, containing message. Compare that to the schema of the existing
IO formats "Protobuf" and "ProtobufSingle":
message MessageType {
uint32 colA = 1;
string colB = 2;
}
The new format does not save space compared to the existing formats, but
it is conceptually a bit more beautiful and also more convenenient.
Implementation details:
- Created new files ProtobufList(Input|Output)Format which use the
existing ProtobufSerializer mechanism. The goal was to reuse as much
code as possible and avoid copypasta.
- I was torn between inheriting from I(Input|Output)Format vs.
IRow(Input|Output)Format for ProtobufList(Input|Output)Format. The
former is chunk-based which can be better for performance. Since the
ProtobufSerializer mechanism is row-based but data is generally passed
around in chunks, I decided for the latter to leverage the existing
chunk <--> row mapping code in IRow(InputOutput)Format.
- A new ProtobufSerializer called ProtobufSerializerEnvelope was
introduced (--> ProtobufSerializer.cpp). It represents the top-level
message which encloses the list of inner nested messages, i.e. the
rows.
- With the new format, parsing the schema file and matching the fields in
the schema file to table column works like for the old formats. The only
difference is that parsing starts one level below the "Envelope" (-->
ProtobufSchema.cpp). This is more natural than forcing customers to
have table columns start with "Envelope".
- Creation of the ProtobufSerializer tree also works like before. What
is different is that we finally add a ProtobufSerializerEnvelope as
new root of the tree. It's only purpose is to write/read the top-level
message for the first/last row to write/read.
Caveats:
- The low-level serialization code in ProtobufWriter uses an internal
buffer which is flushed to the output file only in endMessage().
In the existing "Protobuf" format, this happens once per row, in the
new format this happens only at the end of the serialization
since row-level messages now call start/endNestedMessage(). As a
future TODO to, the buffer should be flushed also in
start/endNestedMessage() to reduce memory consumption.
2022-03-14 08:04:58 +01:00
Maksim Kita
ce0c8e5597
Update JSONRowOutputFormat.cpp
2022-03-14 00:58:36 +01:00
Robert Schulze
f0ba39b071
Clean up some header includes and make formatting more consistent
2022-03-13 20:24:12 +01:00
zhanghuajie
53a8987b3b
fix build fail with gcc --fix warnings without disabling some parameters
2022-03-11 21:59:19 +08:00
shuchaome
7a3623d216
fix bug
2022-03-11 17:26:13 +08:00
Nikolai Kochetov
47f4bd30cd
Merge pull request #35186 from amosbird/fixwithtotalemptychunk
...
Fix empty chunk in with total transform
2022-03-11 10:24:19 +01:00
metahys
ff934cf0c2
Fix unexpected result when use -state type aggregate function in window frame ( #34999 )
...
* Fix unexpected result when use -state type aggregate function in window frame
* fix style
* fix style
* fix test
* fix flaky test
* fix flaky test
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-11 11:54:17 +03:00
shuchaome
46cb4483a6
Optimise by lowering schema on the beginning. Add a functional test.
2022-03-11 14:34:46 +08:00
Amos Bird
a1b61dabfd
Fix empty chunk in with total transform.
2022-03-10 23:27:36 +08:00
mergify[bot]
c326ebd67f
Merge branch 'master' into issue/#34890_fix_input_format_null_as_default_bug
2022-03-09 15:59:04 +00:00
shuchaome
b7cd85df6b
remove unused column_names in ORCBlockInputFormat
2022-03-09 18:16:22 +08:00
shuchaome
bb50133424
Apply suggestions from code review
...
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-09 17:32:27 +08:00
shuchaome
9647818adc
add unlikely for performance
2022-03-09 17:02:07 +08:00
shuchaome
8027bb1e32
modify code style
2022-03-09 16:32:18 +08:00
shuchaome
56795b831d
add setting to lower column case when reading parquet/orc file
2022-03-09 16:07:02 +08:00
zhanghuajie
11dde7c127
fix build fail with gcc
2022-03-08 22:34:51 +08:00
zzsmdfj
67b9f81104
to #31469_MaterializedMysql_mysqlDate2CkDate32
2022-03-08 18:17:22 +08:00
zzsmdfj
7252c18ff0
to #34890_fix_input_format_null_as_default_bug
2022-03-04 15:04:43 +08:00
Anton Popov
df3b07fe7c
Merge remote-tracking branch 'upstream/master' into HEAD
2022-03-03 22:25:28 +00:00
Nikolai Kochetov
32120d5dec
Merge pull request #34993 from ClickHouse/try-fix-delayed-source
...
Avoid pushing to port with data inside DelayedSource
2022-03-03 13:44:45 +01:00
Maksim Kita
b1a956c5f1
clang-tidy check performance-move-const-arg fix
2022-03-02 18:15:27 +00:00
Maksim Kita
1f5837359e
clang-tidy check performance-noexcept-move-constructor fix
2022-03-02 18:15:27 +00:00
Nikolai Kochetov
ed8dfc14d4
Avoid pushing to port with data inside DelayedSource
2022-03-02 14:21:58 +01:00
Anton Popov
2758db5341
add more comments
2022-03-01 19:32:55 +03:00
Anton Popov
fcdebea925
Merge remote-tracking branch 'upstream/master' into HEAD
2022-02-25 13:41:30 +03:00
zvonand
90c857c5e3
merge
2022-02-17 18:23:37 +03:00
Kruglov Pavel
5e8b2228e0
Merge pull request #34561 from bigo-sg/arrow_type_timestamp
...
Implement transformation between CH DateTime64 and arrow timestamp column
2022-02-17 16:55:17 +03:00
zvonand
cf244689a2
fixed filling transform
2022-02-16 15:14:25 +03:00
Anton Popov
72e75fdaf5
Merge pull request #34601 from CurtizJ/filtering-by-sparse-columns
...
Support filtering by sparse columns without conversion to full
2022-02-15 23:26:13 +03:00
Anton Popov
7cddae1351
return back result_size_hint
2022-02-15 15:12:25 +03:00
Anton Popov
5c316ffabe
support filtering by sparse columns without convertion to full
2022-02-15 14:30:54 +03:00
Kruglov Pavel
cf454a6539
Merge pull request #34532 from CurtizJ/fix-aggregation-in-order-3
...
Fix aggregation in order with distributed_aggregation_memory_efficient=0
2022-02-15 14:26:15 +03:00
zvonand
888542e29b
add[interval] no longer oses decimal components
...
Not only support for better subsecond logic, but also fewer conversions
-> faster operation
2022-02-14 02:52:56 +03:00