Commit Graph

2650 Commits

Author SHA1 Message Date
avogar
f5f1db86d9 Remove commented code 2022-04-13 19:15:52 +00:00
avogar
8b60aeb7bc Improve schema inference for json objects 2022-04-13 19:13:40 +00:00
avogar
1c065f8c7a Some refactoring around schema inference with globs 2022-04-13 17:02:48 +00:00
mergify[bot]
ea3afd4c6c
Merge branch 'master' into musl-check 2022-04-13 12:28:34 +00:00
Nikolai Kochetov
362fcfd2b8
Merge pull request #36075 from ClickHouse/fix-limit-push-down-over-window
Disable LIMIT push down through WINDOW functions.
2022-04-13 11:57:37 +02:00
alesapin
2f496c7945 Merge branch 'master' into musl-check 2022-04-12 14:40:47 +02:00
Yakov Olkhovskiy
155a2a0d42
Merge pull request #35349 from yakov-olkhovskiy/interpolate-feature
Interpolate feature
2022-04-11 11:15:50 -04:00
Nikolai Kochetov
2deec53162 Disable LIMIT push down through WINDOW functions. 2022-04-08 13:39:54 +00:00
avogar
1c783ed88a Resolve conflicts 2022-04-07 12:17:48 +00:00
avogar
d2017a63b1 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference 2022-04-07 11:36:40 +00:00
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
Kseniia Sumarokova
18a594a22e
Merge pull request #34631 from bigo-sg/use_minmax_index
Use minmax index for orc/parquet file in Hive Engine
2022-04-07 12:22:19 +02:00
Nikolai Kochetov
3e1b3f14c0
Merge pull request #34355 from azat/processors-profiling
Profiling on Processors level
2022-04-07 12:13:14 +02:00
taiyang-li
2ef316801c Merge branch 'master' into use_minmax_index 2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference 2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e suppoort skip splits in orc and parquet 2022-04-06 16:40:22 +08:00
Yakov Olkhovskiy
90c4cd3de7
Merge branch 'master' into interpolate-feature 2022-04-05 14:39:07 -04:00
Nickita Taranov
0f94a58f3a use getName() 2022-04-04 14:59:38 +02:00
Nickita Taranov
440e57769a more fizes 2022-04-04 14:33:58 +02:00
Nickita Taranov
ce40d84eef more fixes 2022-04-04 14:33:58 +02:00
Nickita Taranov
a39427f00b clean up 2022-04-04 14:33:57 +02:00
Nickita Taranov
eedcd61479 fix 2022-04-04 14:33:57 +02:00
Nickita Taranov
a08c035443 stash 2022-04-04 14:33:57 +02:00
Nickita Taranov
b095838444 stash 2022-04-04 14:33:57 +02:00
Nickita Taranov
4c51329ad6 stash 2022-04-04 14:33:57 +02:00
Nikita Taranov
bd89fcafdb
Make SortDescription::column_name always non-empty (#35805) 2022-04-04 14:17:15 +02:00
何李夫
09c04e4993
Improve the pipeline description for JOIN (#35612)
Improve the pipeline description for JOIN
2022-04-04 13:56:41 +02:00
Azat Khuzhin
58ee917e94 Mesure processors profiles only if it was enabled
Since it may use little extra CPU.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Azat Khuzhin
99528e296c Rename need_data_elapsed_us/port_full_elapsed_us to input_wait_us/output_wait_us
$ gg -e need_data_ -e port_full_  | cut -d: -f1 | sort -u | xargs sed -i -e s/port_full_/output_wait_/g -e s/need_data_/input_wait_/g -e s/getPortFull/getOutputWait/g -e s/getNeedData/getInputWait/g

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Azat Khuzhin
5fd402eaba Measure time that Process spent in work/NeedData/PortFull
Note, that right now it is done not in IProcessor, but in
ExecutingGraph/ExecutionThreadContext, to avoid lots of changes in the
IProcessor interface, to make review easier, but I'm not against of
change the IProcessor interface to incapsulate it there.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Anton Popov
11e18a16f3
Merge pull request #35724 from Avogar/fix-order
Improve schema inference for JSONEachRow and TSKV formats
2022-04-04 11:00:21 +02:00
Alexey Milovidov
5a47958744
Merge pull request #35736 from CurtizJ/quota-written-bytes
Add quota for written bytes
2022-04-03 05:26:49 +03:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order 2022-04-02 12:00:29 +00:00
Anton Popov
687942ce70 more strict quota for written bytes 2022-04-01 15:02:49 +00:00
Yakov Olkhovskiy
538373a79b style fix 2022-03-31 12:13:49 -04:00
Yakov Olkhovskiy
a15996315e bugfix - columns order tracking 2022-03-31 11:51:13 -04:00
avogar
ab2a963287 Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers 2022-03-31 14:09:43 +00:00
Kruglov Pavel
252d66e80d
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-31 16:08:37 +02:00
mergify[bot]
24ade25d61
Merge branch 'master' into improve-schema-inference 2022-03-31 13:42:47 +00:00
Yakov Olkhovskiy
b5682c1f02 minor refactoring 2022-03-31 08:33:50 -04:00
avogar
836e7dae67 Fix bug in indexes of not presented columns in -WithNames formats 2022-03-31 12:24:40 +00:00
avogar
d272356324 Minor code improvement 2022-03-31 10:55:09 +00:00
avogar
74275da7ee Make better 2022-03-31 10:52:34 +00:00
Yakov Olkhovskiy
6a1e116c46 refactoring 2022-03-30 16:34:19 -04:00
Antonio Andelic
d85ed8f2a9
Merge pull request #35655 from ClickHouse/exception-compile-time-message-check
Use compile-time check for `Exception` messages
2022-03-30 08:11:32 +02:00
Anton Popov
caacc7d385 add quota for written bytes 2022-03-29 18:21:29 +00:00
avogar
000f3043e7 Make better 2022-03-29 17:40:07 +00:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9 Improve schema inference for JSONEachRow and TSKV formats 2022-03-29 14:47:51 +00:00
Dmitry Novik
8f935a72d6
Merge pull request #33230 from CurtizJ/read_in_order_max_rows_to_read
Proper handle of 'max_rows_to_read' in case of reading in order of sorting key
2022-03-29 15:16:34 +02:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
avogar
97f5033ea9 Fix tests 2022-03-29 13:07:37 +00:00
mergify[bot]
343588de2c
Merge branch 'master' into improve-schema-inference 2022-03-29 13:06:00 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Alexey Milovidov
5e262fba85
Merge pull request #35204 from azat/build-gcc
Add build with GCC
2022-03-29 04:55:15 +03:00
Yakov Olkhovskiy
615efa1381 aliases processing fixed 2022-03-28 19:15:53 -04:00
Azat Khuzhin
bf4df5c6bb Fix SIGSEGV for build under gcc-11 (due to auto deduction)
During building with gcc-11 you will got SIGSEGV for building
InterpretersMySQLDDLQuery (and some others), and it is due to endless
recursion:

    (gdb) bt 5
    0  0x00000000010978f2 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=0) at ../../src/gcc/cp/typeck.c:1274
    1  0x00000000011c3f9d in comp_template_parms (parms1=<optimized out>, parms2=<optimized out>) at ../../src/gcc/cp/pt.c:3369
    2  0x0000000001097bd9 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
    3  0x00000000011c3f9d in comp_template_parms (parms1=<optimized out>, parms2=<optimized out>) at ../../src/gcc/cp/pt.c:3369
    4  0x0000000001097bd9 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
    (gdb) bt -X
    1397454 0x0000000001097bd9 in structural_comptypes (t1=0x7fde028d1540, t2=0x7fde028d27e0, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
    1397455 0x0000000000f2d8b5 in cp_tree_equal (t1=<optimized out>, t2=<optimized out>) at ../../src/gcc/cp/tree.c:4144
    1397456 0x00000000010909cb in template_args_equal (ot=0x7fde028cf578, nt=0x7fde028cfc58, partial_order=<optimized out>) at ../../src/gcc/cp/pt.c:9256
    1397457 0x0000000001090422 in template_args_equal (partial_order=false, nt=0x7fde028cfc58, ot=0x7fde028cf578) at ../../src/gcc/cp/pt.c:9295
    1397458 comp_template_args (oldargs=0x7fde028cf550, newargs=0x7fde028cfc30, oldarg_ptr=0x0, newarg_ptr=0x0, partial_order=false) at ../../src/gcc/cp/pt.c:9285
    1397459 0x00000000010a08f4 in spec_hasher::equal (e1=0x7fde028c95d0, e2=0x7ffd1194e8c0) at ../../src/gcc/cp/pt.c:1726
    1397460 0x0000000001085965 in hash_table<spec_hasher, false, xcallocator>::find_with_hash (this=0x7fde36b7f450, comparable=@0x7ffd1194e8b8: 0x7ffd1194e8c0, hash=<optimized out>) at ../../src/gcc/hash-table.h:936
    1397461 0x0000000001079698 in lookup_template_class_1 (d1=<optimized out>, arglist=0x7fde028cfc30, in_decl=0x0, context=<optimized out>, entering_scope=<optimized out>, complain=3) at ../../src/gcc/cp/pt.c:9896
    1397462 0x000000000109f8ef in lookup_template_class (complain=3, entering_scope=1, context=0x7fde27558e40, in_decl=0x0, arglist=0x7fde028cfc08, d1=0x7fde269bcd20) at ../../src/gcc/cp/pt.c:10251
    1397463 tsubst_aggr_type (t=0x7fde269bcd20, args=<optimized out>, complain=3, in_decl=0x0, entering_scope=1) at ../../src/gcc/cp/pt.c:13646
    1397464 0x000000000108f797 in tsubst (t=0x7fde269bcdc8, args=0x7fde028cf7a8, complain=3, in_decl=<optimized out>) at ../../src/gcc/cp/pt.c:16108
    1397465 0x0000000000e61bf2 in rewrite_template_parm (level=1, complain=3, tsubst_args=0x7fde028cf7a8, index=5, olddecl=0x7fde269b5600) at ../../src/gcc/cp/pt.c:28556
    1397466 rewrite_tparm_list(tree_node*, unsigned int, unsigned int, tree_node*, unsigned int, int) [clone .constprop.0] (oldelt=0x7fde269bd190, index=5, targs=0x7fde028cf7a8, targs_index=4, complain=3, level=1) at ../../src/gcc/cp/pt.c:28640
    1397467 0x00000000009f3748 in build_deduction_guide (type=type@entry=0x7fde26e13dc8, ctor=0x7fde269ac300, outer_args=outer_args@entry=0x0, complain=complain@entry=3) at ../../src/gcc/cp/pt.c:28769
    1397468 0x00000000009f444f in ctor_deduction_guides_for (complain=3, tmpl=<optimized out>) at ../../src/gcc/cp/cp-tree.h:842
    1397469 deduction_guides_for (tmpl=<optimized out>, any_dguides_p=<optimized out>, complain=3) at ../../src/gcc/cp/pt.c:29282
    1397470 0x00000000008507a8 in do_class_deduction (complain=3, flags=1, init=<optimized out>, tmpl=0x7fde26e0f980, ptype=0x7fde028c7b28) at ../../src/gcc/cp/pt.c:29402
    1397471 do_auto_deduction (type=0x7fde028c7b28, init=<optimized out>, auto_node=<optimized out>, complain=3, context=<optimized out>, outer_targs=<optimized out>, flags=1) at ../../src/gcc/cp/pt.c:29572
    1397472 0x00000000007c9569 in finish_compound_literal (type=<optimized out>, compound_literal=0x7fde028c95b8, complain=3, fcl_context=fcl_functional) at ../../src/gcc/cp/semantics.c:3060
    1397473 0x0000000001123a79 in cp_parser_functional_cast (parser=0x7fde27558da8, type=0x7fde028c7b28) at ../../src/gcc/cp/parser.c:30670
    1397474 0x0000000000fd7873 in cp_parser_postfix_expression (parser=0x7fde27558da8, address_p=<optimized out>, cast_p=<optimized out>, member_access_only_p=<optimized out>, decltype_p=false, pidk_return=0x0) at ../../src/gcc/cp/parser.c:7437
    1397475 0x0000000000fd4ddf in cp_parser_binary_expression (parser=0x7fde27558da8, cast_p=<optimized out>, no_toplevel_fold_p=false, decltype_p=<optimized out>, prec=PREC_NOT_OPERATOR, pidk=<optimized out>) at ../../src/gcc/cp/parser.c:9842
    1397476 0x0000000000fd4595 in cp_parser_assignment_expression (parser=0x7fde27558da8, pidk=<optimized out>, cast_p=<optimized out>, decltype_p=<optimized out>) at ../../src/gcc/cp/parser.c:10146
    1397477 0x0000000000fd3b90 in cp_parser_constant_expression (parser=0x7fde27558da8, allow_non_constant_p=2, non_constant_p=0x7ffd1194f1d7, strict_p=<optimized out>) at ../../src/gcc/cp/parser.c:10449
    1397478 0x0000000000fcfdd5 in cp_parser_initializer_clause (non_constant_p=<optimized out>, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:24253
    1397479 cp_parser_initializer (parser=0x7fde27558da8, is_direct_init=<optimized out>, non_constant_p=<optimized out>, subexpression_p=<optimized out>) at ../../src/gcc/cp/parser.c:24193
    1397480 0x000000000062e5d8 in cp_parser_decomposition_declaration (init_loc=0x7ffd1194f1d8, maybe_range_for_decl=0x7ffd1194f498, decl_specifiers=0x7ffd1194f1f0, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:14734
    1397481 cp_parser_simple_declaration (parser=0x7fde27558da8, function_definition_allowed_p=<optimized out>, maybe_range_for_decl=0x7ffd1194f498) at ../../src/gcc/cp/parser.c:14393
    1397482 0x000000000109b870 in cp_parser_init_statement (parser=0x7fde27558da8, decl=0x7ffd1194f498) at ../../src/gcc/cp/parser.c:13420
    1397483 0x00000000010996f0 in cp_parser_for (unroll=0, ivdep=false, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:12708
    1397484 cp_parser_iteration_statement (parser=0x7fde27558da8, if_p=0x0, ivdep=<optimized out>, unroll=<optimized out>) at ../../src/gcc/cp/parser.c:13343
    1397485 0x0000000000fe5c46 in cp_parser_statement (parser=0x7fde27558da8, in_statement_expr=0x0, in_compound=<optimized out>, if_p=0x0, chain=0x0, loc_after_labels=0x0) at ../../src/gcc/cp/parser.c:11718
    1397486 0x0000000000fe15ac in cp_parser_statement_seq_opt (in_statement_expr=<optimized out>, parser=<optimized out>) at ../../src/gcc/cp/parser.c:12201
    1397487 cp_parser_compound_statement (parser=0x7fde27558da8, in_statement_expr=0x0, bcs_flags=<optimized out>, function_body=<optimized out>) at ../../src/gcc/cp/parser.c:12150

Interesting frame is 1397471, from which we can extract location:

    (gdb) p line_table[0].info_ordinary.maps[line_table[0].info_ordinary.cache]
    $54 = {
      <line_map> = {
        start_location = 1396581280
      },
      members of line_map_ordinary:
      reason = LC_RENAME,
      sysp = 0 '\000',
      m_column_and_range_bits = 8,
      m_range_bits = 0,
      to_file = 0x3eb4bb0 "/ch/src/Interpreters/MySQL/InterpretersMySQLDDLQuery.cpp",
      to_line = 46,
      included_from = 0
    }

By replicating SOURCE_LINE() macro from gcc-11 (libcpp/include/line-map.h):

    /* Converts a map and a location_t to source line.  */
    inline linenum_type
    SOURCE_LINE (const line_map_ordinary *ord_map, location_t loc)
    {
      return ((loc - ord_map->start_location)
          >> ord_map->m_column_and_range_bits) + ord_map->to_line;
    }

We got line 154:

    (gdb) p ((input_location-1396581280) >> 8) + 46
    $61 = 154

And this is:

    auto [column_name_and_type, declare_column_ast] = std::tuple{columns_name_and_type.begin(), columns_definition->children.begin()};

After rewriting it, everything works correctly.

Also here is a reproducer for gcc-11 (and no failures for gcc-12, but
gcc-12 has other issues, like [1] and one more for hash table):

    # cat /tmp/test.cpp
    #include <tuple>

    auto multi()
    {
            return std::tuple{1, 1};
    }
    double foo()
    {
            auto [a, b] = multi();
            return a - b;
    }

    # g++-11 -std=gnu++20 -c -o /dev/null -isystem /ch/contrib/libcxx/include -nostdinc++ /tmp/test.cpp
    g++-11: internal compiler error: Segmentation fault signal terminated program cc1plus
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <file:///usr/share/doc/gcc-11/README.Bugs> for instructions.
    # g++-12 -std=gnu++20 -c -o /dev/null -isystem /ch/contrib/libcxx/include -nostdinc++ /tmp/test.cpp

  [1]: https://reviews.llvm.org/D122598

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-28 22:21:45 +03:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
mergify[bot]
f09ebea2d8
Merge branch 'master' into read_in_order_max_rows_to_read 2022-03-28 13:30:34 +00:00
Yakov Olkhovskiy
5a4694f340 major refactoring, simplified, optimized, bugs fixed 2022-03-27 14:32:09 -04:00
Anton Popov
67195bfdd5 support schema inference for type Object in format JSONEachRow 2022-03-25 21:51:53 +00:00
Vladimir C
cfb12aff6f
Merge pull request #35460 from helifu/master 2022-03-25 15:55:17 +01:00
avogar
6fb3c3be04 Fix comments and build 2022-03-25 12:02:21 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference 2022-03-25 12:05:40 +01:00
Vladimir C
ae92963b15
Fix build error in Formats/ISchemaReader.cpp 2022-03-25 11:30:25 +01:00
Yakov Olkhovskiy
adefcfd299
Merge branch 'master' into interpolate-feature 2022-03-24 15:33:09 -04:00
Yakov Olkhovskiy
83f406b722 optimization, INTERPOLATE without expr. list, any column is allowed except WITH FILL 2022-03-24 15:29:29 -04:00
Kruglov Pavel
287e1a6efc
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:52 +01:00
Kruglov Pavel
6a9df9d471
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:47 +01:00
Kruglov Pavel
3b801a4093
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:41 +01:00
Anton Popov
78100abc5f add parallel parsing and schema inference for type Object 2022-03-24 17:51:35 +00:00
avogar
557edbd172 Add some improvements and fixes in schema inference 2022-03-24 12:54:12 +00:00
vdimir
f106e2dd49
fix style in QueryPlan.cpp 2022-03-24 11:53:58 +00:00
vdimir
d16ae46589
remove description for ReadFromMergeTree from pipeline, adjust tests for plan 2022-03-24 11:31:52 +00:00
helifu
8a5bd2defa
Add explicit table info to the scan node of query plan and pipeline
:) explain plan select * from table1 t1 left join table2 t2 on t1.name = t2.name;
┌─explain──────────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                                  │
│   Join (JOIN)                                                                                │
│     Expression (Before JOIN)                                                                 │
│       SettingQuotaAndLimits (Set limits and quota after reading from storage)                │
│         ReadFromMergeTree (default.table1)                                                   │
│     Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY)))) │
│       SettingQuotaAndLimits (Set limits and quota after reading from storage)                │
│         ReadFromMergeTree (default.table2)                                                   │
└──────────────────────────────────────────────────────────────────────────────────────────────┘

:) explain pipeline select * from table1 t1 left join table2 t2 on t1.name = t2.name;
┌─explain──────────────────────────────────────────┐
│ (Expression)                                     │
│ ExpressionTransform × 24                         │
│   (Join)                                         │
│   JoiningTransform × 24 2 → 1                    │
│     Resize 1 → 24                                │
│       FillingRightJoinSide                       │
│         Resize 24 → 1                            │
│           (Expression)                           │
│           ExpressionTransform × 24               │
│             (SettingQuotaAndLimits)              │
│               (ReadFromMergeTree default.table1) │
│               MergeTreeThread × 24 0 → 1         │
│           (Expression)                           │
│           ExpressionTransform × 24               │
│             (SettingQuotaAndLimits)              │
│               (ReadFromMergeTree default.table2) │
│               MergeTreeThread × 24 0 → 1         │
└──────────────────────────────────────────────────┘
2022-03-24 10:49:12 +00:00
mergify[bot]
bf90edc362
Merge branch 'master' into case-insensitive-column-matching 2022-03-24 08:00:42 +00:00
Kruglov Pavel
826b933b08
Merge pull request #35332 from Avogar/fix-tskv-schema-inference
Fix schema inference for TSKV format while using small max_read_buffer_size
2022-03-23 18:37:07 +01:00
Antonio Andelic
052057f2ef Address PR comments 2022-03-23 15:42:46 +00:00
Anton Popov
f693eba568 fix tests with approx rows 2022-03-22 14:30:40 +00:00
Antonio Andelic
6b6190554b Fix conversion of arrow to CH column with hint header 2022-03-22 11:15:48 +00:00
Antonio Andelic
0c23cd7b94 Add support for case insensitive column matching in arrow 2022-03-22 10:55:10 +00:00
Antonio Andelic
ca7844e338 Fix tests 2022-03-22 09:27:20 +00:00
Antonio Andelic
6cebb6bc88 Merge branch 'master' into case-insensitive-column-matching 2022-03-22 07:36:35 +00:00
mergify[bot]
e11ef05c2b
Merge branch 'master' into issue_33147 2022-03-21 13:40:17 +00:00
Antonio Andelic
cb3703b46e Style fix 2022-03-21 12:54:56 +00:00
Antonio Andelic
0457a3998a remove old test 2022-03-21 11:58:55 +00:00
Kruglov Pavel
1645b7083f
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:12 +01:00
Kruglov Pavel
0b381ebd26
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:06 +01:00
Kruglov Pavel
f67b8c0bad
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:00 +01:00
Kruglov Pavel
ed7b40253c
Merge pull request #35039 from zzsmdfj/issue/#34890_fix_input_format_null_as_default_bug
to #34890_fix_input_format_null_as_default_bug
2022-03-21 12:42:17 +01:00
Antonio Andelic
0c74fa2c19 Remove unecessary code 2022-03-21 08:38:15 +00:00
tavplubix
716c6f0ffa
Merge pull request #35406 from Avogar/fix-parquet
Fix working with unneeded columns in Arrow/Parquet/ORC formats
2022-03-21 11:36:54 +03:00
Antonio Andelic
29d2bf7d1a Merge branch 'master' into case-insensitive-column-matching 2022-03-21 08:17:27 +00:00
Antonio Andelic
d73c906e68 Format code 2022-03-21 07:50:17 +00:00
Antonio Andelic
f75b054255 Allow case insensitive column matching 2022-03-21 07:47:37 +00:00
Yakov Olkhovskiy
481ee8aad5
Update FillingTransform.cpp
use range-based for loop
2022-03-19 12:17:56 -04:00
Yakov Olkhovskiy
eb7474e73a
Merge branch 'master' into interpolate-feature 2022-03-19 03:11:14 -04:00
Nikita Taranov
7d61fa5f05 impl 2022-03-18 22:58:35 +00:00
Yakov Olkhovskiy
a8e1671a76 type match check for INTERPOLATE expressions added, bugfix, printout fixed 2022-03-18 16:44:27 -04:00
avogar
58f2aca120 Fix tests 2022-03-18 19:04:16 +00:00
avogar
cffa2096de Fix working with unneeded columns in Arrow/Parquet/ORC formats 2022-03-18 13:07:54 +00:00
Kruglov Pavel
aa3c05e9d4
Merge pull request #35152 from rschu1ze/protobuf-batch-write
ProtobufList
2022-03-18 13:24:34 +01:00
mergify[bot]
28734562bb
Merge branch 'master' into issue/#34890_fix_input_format_null_as_default_bug 2022-03-17 13:24:34 +00:00
Antonio Andelic
607f785e48 Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
This reverts commit ebf72bf61d, reversing
changes made to f1b812bdc1.
2022-03-17 12:31:43 +00:00
Yakov Olkhovskiy
00700988ee style fix 2022-03-17 02:31:01 -04:00
Yakov Olkhovskiy
7bb66e6702 added INTERPOLATE extension for ORDER BY WITH FILL 2022-03-17 01:51:35 -04:00
Anton Popov
2ced42ed41 add experimental settings for Object type 2022-03-16 16:51:23 +00:00
Anton Popov
0ba78c3c3a Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-16 15:28:09 +00:00
avogar
f7c5fe14e4 Fix schema inference for TSKV format while using small max_read_buffer_size 2022-03-16 13:53:50 +00:00
Robert Schulze
0d2ece6d91
Merge branch 'ClickHouse:master' into protobuf-batch-write 2022-03-16 09:43:33 +01:00
Robert Schulze
23122cb327
Fix review comments
ParquetBlockOutputFormat.cpp:
- undo unrelated formatting

ProtobufSerializer.cpp:
- undef debug tracing
- simplify logic in writeRow()

ProtobufSchemas.cpp:
- restore original search in cache by message type
2022-03-15 11:27:17 +01:00
Maksim Kita
2665724301 Fix clang-tidy warnings in Parsers, Processors, QueryPipeline folders 2022-03-14 18:17:35 +00:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
Maksim Kita
ad6b3693e1
Merge pull request #35123 from zhanghuajieHIT/fix_build_fail_with_gcc
fix build fail with gcc
2022-03-14 10:36:15 +01:00
Kseniia Sumarokova
58a2d2b458
Merge pull request #35118 from zzsmdfj/issue/#31469_MaterializedMysql_mysqlDate2CkDate32
to #31469_MaterializedMysql_mysqlDate2CkDate32
2022-03-14 10:32:33 +01:00
Robert Schulze
514d4d2187
Implement ProtobufList - fixes ClickHouse#16436
Introduce IO format "ProtobufList" with protobuf schema

    // schemafile.proto
    message Envelope {
      message MessageType {
        uint32 colA = 1;
        string colB = 2;
      }
      repeated MessageType mt = 1;
    }

where "Envelope" is a hard-coded/expected top-level message and
"MessageType" is a message with user-provided name containing the table
fields to export/import, e.g.

    SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema =
    'schemafile:MessageType'

As a result, the new format wraps a list of messages (one per row) into
a single, containing message. Compare that to the schema of the existing
IO formats "Protobuf" and "ProtobufSingle":

    message MessageType {
      uint32 colA = 1;
      string colB = 2;
    }

The new format does not save space compared to the existing formats, but
it is conceptually a bit more beautiful and also more convenenient.

Implementation details:

- Created new files ProtobufList(Input|Output)Format which use the
  existing ProtobufSerializer mechanism. The goal was to reuse as much
  code as possible and avoid copypasta.

- I was torn between inheriting from I(Input|Output)Format vs.
  IRow(Input|Output)Format for ProtobufList(Input|Output)Format. The
  former is chunk-based which can be better for performance. Since the
  ProtobufSerializer mechanism is row-based but data is generally passed
  around in chunks, I decided for the latter to leverage the existing
  chunk <--> row mapping code in IRow(InputOutput)Format.

- A new ProtobufSerializer called ProtobufSerializerEnvelope was
  introduced (--> ProtobufSerializer.cpp). It represents the top-level
  message which encloses the list of inner nested messages, i.e. the
  rows.

- With the new format, parsing the schema file and matching the fields in
  the schema file to table column works like for the old formats. The only
  difference is that parsing starts one level below the "Envelope" (-->
  ProtobufSchema.cpp). This is more natural than forcing customers to
  have table columns start with "Envelope".

- Creation of the ProtobufSerializer tree also works like before. What
  is different is that we finally add a ProtobufSerializerEnvelope as
  new root of the tree. It's only purpose is to write/read the top-level
  message for the first/last row to write/read.

Caveats:

- The low-level serialization code in ProtobufWriter uses an internal
  buffer which is flushed to the output file only in endMessage().
  In the existing "Protobuf" format, this happens once per row, in the
  new format this happens only at the end of the serialization
  since row-level messages now call start/endNestedMessage(). As a
  future TODO to, the buffer should be flushed also in
  start/endNestedMessage() to reduce memory consumption.
2022-03-14 08:04:58 +01:00
Maksim Kita
ce0c8e5597
Update JSONRowOutputFormat.cpp 2022-03-14 00:58:36 +01:00
Robert Schulze
f0ba39b071
Clean up some header includes and make formatting more consistent 2022-03-13 20:24:12 +01:00
zhanghuajie
53a8987b3b fix build fail with gcc --fix warnings without disabling some parameters 2022-03-11 21:59:19 +08:00
shuchaome
7a3623d216 fix bug 2022-03-11 17:26:13 +08:00
Nikolai Kochetov
47f4bd30cd
Merge pull request #35186 from amosbird/fixwithtotalemptychunk
Fix empty chunk in with total transform
2022-03-11 10:24:19 +01:00
metahys
ff934cf0c2
Fix unexpected result when use -state type aggregate function in window frame (#34999)
* Fix unexpected result when use -state type aggregate function in window frame

* fix style

* fix style

* fix test

* fix flaky test

* fix flaky test

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-11 11:54:17 +03:00
shuchaome
46cb4483a6 Optimise by lowering schema on the beginning. Add a functional test. 2022-03-11 14:34:46 +08:00
Amos Bird
a1b61dabfd
Fix empty chunk in with total transform. 2022-03-10 23:27:36 +08:00
mergify[bot]
c326ebd67f
Merge branch 'master' into issue/#34890_fix_input_format_null_as_default_bug 2022-03-09 15:59:04 +00:00
shuchaome
b7cd85df6b remove unused column_names in ORCBlockInputFormat 2022-03-09 18:16:22 +08:00
shuchaome
bb50133424
Apply suggestions from code review
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-09 17:32:27 +08:00
shuchaome
9647818adc add unlikely for performance 2022-03-09 17:02:07 +08:00
shuchaome
8027bb1e32 modify code style 2022-03-09 16:32:18 +08:00
shuchaome
56795b831d add setting to lower column case when reading parquet/orc file 2022-03-09 16:07:02 +08:00
zhanghuajie
11dde7c127 fix build fail with gcc 2022-03-08 22:34:51 +08:00
zzsmdfj
67b9f81104 to #31469_MaterializedMysql_mysqlDate2CkDate32 2022-03-08 18:17:22 +08:00
zzsmdfj
7252c18ff0 to #34890_fix_input_format_null_as_default_bug 2022-03-04 15:04:43 +08:00
Anton Popov
df3b07fe7c Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-03 22:25:28 +00:00
Nikolai Kochetov
32120d5dec
Merge pull request #34993 from ClickHouse/try-fix-delayed-source
Avoid pushing to port with data inside DelayedSource
2022-03-03 13:44:45 +01:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
Maksim Kita
1f5837359e clang-tidy check performance-noexcept-move-constructor fix 2022-03-02 18:15:27 +00:00
Nikolai Kochetov
ed8dfc14d4 Avoid pushing to port with data inside DelayedSource 2022-03-02 14:21:58 +01:00
Anton Popov
2758db5341 add more comments 2022-03-01 19:32:55 +03:00
Anton Popov
fcdebea925 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-25 13:41:30 +03:00
zvonand
90c857c5e3 merge 2022-02-17 18:23:37 +03:00
Kruglov Pavel
5e8b2228e0
Merge pull request #34561 from bigo-sg/arrow_type_timestamp
Implement transformation between CH DateTime64 and arrow timestamp column
2022-02-17 16:55:17 +03:00
zvonand
cf244689a2 fixed filling transform 2022-02-16 15:14:25 +03:00
Anton Popov
72e75fdaf5
Merge pull request #34601 from CurtizJ/filtering-by-sparse-columns
Support filtering by sparse columns without conversion to full
2022-02-15 23:26:13 +03:00
Anton Popov
7cddae1351 return back result_size_hint 2022-02-15 15:12:25 +03:00
Anton Popov
5c316ffabe support filtering by sparse columns without convertion to full 2022-02-15 14:30:54 +03:00
Kruglov Pavel
cf454a6539
Merge pull request #34532 from CurtizJ/fix-aggregation-in-order-3
Fix aggregation in order with distributed_aggregation_memory_efficient=0
2022-02-15 14:26:15 +03:00
zvonand
888542e29b add[interval] no longer oses decimal components
Not only support for better subsecond logic, but also fewer conversions
-> faster operation
2022-02-14 02:52:56 +03:00