Commit Graph

2514 Commits

Author SHA1 Message Date
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
Kseniia Sumarokova
18a594a22e
Merge pull request #34631 from bigo-sg/use_minmax_index
Use minmax index for orc/parquet file in Hive Engine
2022-04-07 12:22:19 +02:00
Nikolai Kochetov
3e1b3f14c0
Merge pull request #34355 from azat/processors-profiling
Profiling on Processors level
2022-04-07 12:13:14 +02:00
taiyang-li
2ef316801c Merge branch 'master' into use_minmax_index 2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
taiyang-li
acb9f1632e suppoort skip splits in orc and parquet 2022-04-06 16:40:22 +08:00
Nickita Taranov
0f94a58f3a use getName() 2022-04-04 14:59:38 +02:00
Nickita Taranov
440e57769a more fizes 2022-04-04 14:33:58 +02:00
Nickita Taranov
ce40d84eef more fixes 2022-04-04 14:33:58 +02:00
Nickita Taranov
a39427f00b clean up 2022-04-04 14:33:57 +02:00
Nickita Taranov
eedcd61479 fix 2022-04-04 14:33:57 +02:00
Nickita Taranov
a08c035443 stash 2022-04-04 14:33:57 +02:00
Nickita Taranov
b095838444 stash 2022-04-04 14:33:57 +02:00
Nickita Taranov
4c51329ad6 stash 2022-04-04 14:33:57 +02:00
Nikita Taranov
bd89fcafdb
Make SortDescription::column_name always non-empty (#35805) 2022-04-04 14:17:15 +02:00
何李夫
09c04e4993
Improve the pipeline description for JOIN (#35612)
Improve the pipeline description for JOIN
2022-04-04 13:56:41 +02:00
Azat Khuzhin
58ee917e94 Mesure processors profiles only if it was enabled
Since it may use little extra CPU.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Azat Khuzhin
99528e296c Rename need_data_elapsed_us/port_full_elapsed_us to input_wait_us/output_wait_us
$ gg -e need_data_ -e port_full_  | cut -d: -f1 | sort -u | xargs sed -i -e s/port_full_/output_wait_/g -e s/need_data_/input_wait_/g -e s/getPortFull/getOutputWait/g -e s/getNeedData/getInputWait/g

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Azat Khuzhin
5fd402eaba Measure time that Process spent in work/NeedData/PortFull
Note, that right now it is done not in IProcessor, but in
ExecutingGraph/ExecutionThreadContext, to avoid lots of changes in the
IProcessor interface, to make review easier, but I'm not against of
change the IProcessor interface to incapsulate it there.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-04-04 13:19:43 +03:00
Anton Popov
11e18a16f3
Merge pull request #35724 from Avogar/fix-order
Improve schema inference for JSONEachRow and TSKV formats
2022-04-04 11:00:21 +02:00
Alexey Milovidov
5a47958744
Merge pull request #35736 from CurtizJ/quota-written-bytes
Add quota for written bytes
2022-04-03 05:26:49 +03:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order 2022-04-02 12:00:29 +00:00
Anton Popov
687942ce70 more strict quota for written bytes 2022-04-01 15:02:49 +00:00
avogar
ab2a963287 Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers 2022-03-31 14:09:43 +00:00
Kruglov Pavel
252d66e80d
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-31 16:08:37 +02:00
avogar
836e7dae67 Fix bug in indexes of not presented columns in -WithNames formats 2022-03-31 12:24:40 +00:00
avogar
d272356324 Minor code improvement 2022-03-31 10:55:09 +00:00
avogar
74275da7ee Make better 2022-03-31 10:52:34 +00:00
Antonio Andelic
d85ed8f2a9
Merge pull request #35655 from ClickHouse/exception-compile-time-message-check
Use compile-time check for `Exception` messages
2022-03-30 08:11:32 +02:00
Anton Popov
caacc7d385 add quota for written bytes 2022-03-29 18:21:29 +00:00
avogar
000f3043e7 Make better 2022-03-29 17:40:07 +00:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9 Improve schema inference for JSONEachRow and TSKV formats 2022-03-29 14:47:51 +00:00
Dmitry Novik
8f935a72d6
Merge pull request #33230 from CurtizJ/read_in_order_max_rows_to_read
Proper handle of 'max_rows_to_read' in case of reading in order of sorting key
2022-03-29 15:16:34 +02:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Alexey Milovidov
5e262fba85
Merge pull request #35204 from azat/build-gcc
Add build with GCC
2022-03-29 04:55:15 +03:00
Azat Khuzhin
bf4df5c6bb Fix SIGSEGV for build under gcc-11 (due to auto deduction)
During building with gcc-11 you will got SIGSEGV for building
InterpretersMySQLDDLQuery (and some others), and it is due to endless
recursion:

    (gdb) bt 5
    0  0x00000000010978f2 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=0) at ../../src/gcc/cp/typeck.c:1274
    1  0x00000000011c3f9d in comp_template_parms (parms1=<optimized out>, parms2=<optimized out>) at ../../src/gcc/cp/pt.c:3369
    2  0x0000000001097bd9 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
    3  0x00000000011c3f9d in comp_template_parms (parms1=<optimized out>, parms2=<optimized out>) at ../../src/gcc/cp/pt.c:3369
    4  0x0000000001097bd9 in structural_comptypes (t1=0x7fde028c7dc8, t2=0x7fde028d1e70, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
    (gdb) bt -X
    1397454 0x0000000001097bd9 in structural_comptypes (t1=0x7fde028d1540, t2=0x7fde028d27e0, strict=<optimized out>) at ../../src/gcc/cp/typeck.c:1361
    1397455 0x0000000000f2d8b5 in cp_tree_equal (t1=<optimized out>, t2=<optimized out>) at ../../src/gcc/cp/tree.c:4144
    1397456 0x00000000010909cb in template_args_equal (ot=0x7fde028cf578, nt=0x7fde028cfc58, partial_order=<optimized out>) at ../../src/gcc/cp/pt.c:9256
    1397457 0x0000000001090422 in template_args_equal (partial_order=false, nt=0x7fde028cfc58, ot=0x7fde028cf578) at ../../src/gcc/cp/pt.c:9295
    1397458 comp_template_args (oldargs=0x7fde028cf550, newargs=0x7fde028cfc30, oldarg_ptr=0x0, newarg_ptr=0x0, partial_order=false) at ../../src/gcc/cp/pt.c:9285
    1397459 0x00000000010a08f4 in spec_hasher::equal (e1=0x7fde028c95d0, e2=0x7ffd1194e8c0) at ../../src/gcc/cp/pt.c:1726
    1397460 0x0000000001085965 in hash_table<spec_hasher, false, xcallocator>::find_with_hash (this=0x7fde36b7f450, comparable=@0x7ffd1194e8b8: 0x7ffd1194e8c0, hash=<optimized out>) at ../../src/gcc/hash-table.h:936
    1397461 0x0000000001079698 in lookup_template_class_1 (d1=<optimized out>, arglist=0x7fde028cfc30, in_decl=0x0, context=<optimized out>, entering_scope=<optimized out>, complain=3) at ../../src/gcc/cp/pt.c:9896
    1397462 0x000000000109f8ef in lookup_template_class (complain=3, entering_scope=1, context=0x7fde27558e40, in_decl=0x0, arglist=0x7fde028cfc08, d1=0x7fde269bcd20) at ../../src/gcc/cp/pt.c:10251
    1397463 tsubst_aggr_type (t=0x7fde269bcd20, args=<optimized out>, complain=3, in_decl=0x0, entering_scope=1) at ../../src/gcc/cp/pt.c:13646
    1397464 0x000000000108f797 in tsubst (t=0x7fde269bcdc8, args=0x7fde028cf7a8, complain=3, in_decl=<optimized out>) at ../../src/gcc/cp/pt.c:16108
    1397465 0x0000000000e61bf2 in rewrite_template_parm (level=1, complain=3, tsubst_args=0x7fde028cf7a8, index=5, olddecl=0x7fde269b5600) at ../../src/gcc/cp/pt.c:28556
    1397466 rewrite_tparm_list(tree_node*, unsigned int, unsigned int, tree_node*, unsigned int, int) [clone .constprop.0] (oldelt=0x7fde269bd190, index=5, targs=0x7fde028cf7a8, targs_index=4, complain=3, level=1) at ../../src/gcc/cp/pt.c:28640
    1397467 0x00000000009f3748 in build_deduction_guide (type=type@entry=0x7fde26e13dc8, ctor=0x7fde269ac300, outer_args=outer_args@entry=0x0, complain=complain@entry=3) at ../../src/gcc/cp/pt.c:28769
    1397468 0x00000000009f444f in ctor_deduction_guides_for (complain=3, tmpl=<optimized out>) at ../../src/gcc/cp/cp-tree.h:842
    1397469 deduction_guides_for (tmpl=<optimized out>, any_dguides_p=<optimized out>, complain=3) at ../../src/gcc/cp/pt.c:29282
    1397470 0x00000000008507a8 in do_class_deduction (complain=3, flags=1, init=<optimized out>, tmpl=0x7fde26e0f980, ptype=0x7fde028c7b28) at ../../src/gcc/cp/pt.c:29402
    1397471 do_auto_deduction (type=0x7fde028c7b28, init=<optimized out>, auto_node=<optimized out>, complain=3, context=<optimized out>, outer_targs=<optimized out>, flags=1) at ../../src/gcc/cp/pt.c:29572
    1397472 0x00000000007c9569 in finish_compound_literal (type=<optimized out>, compound_literal=0x7fde028c95b8, complain=3, fcl_context=fcl_functional) at ../../src/gcc/cp/semantics.c:3060
    1397473 0x0000000001123a79 in cp_parser_functional_cast (parser=0x7fde27558da8, type=0x7fde028c7b28) at ../../src/gcc/cp/parser.c:30670
    1397474 0x0000000000fd7873 in cp_parser_postfix_expression (parser=0x7fde27558da8, address_p=<optimized out>, cast_p=<optimized out>, member_access_only_p=<optimized out>, decltype_p=false, pidk_return=0x0) at ../../src/gcc/cp/parser.c:7437
    1397475 0x0000000000fd4ddf in cp_parser_binary_expression (parser=0x7fde27558da8, cast_p=<optimized out>, no_toplevel_fold_p=false, decltype_p=<optimized out>, prec=PREC_NOT_OPERATOR, pidk=<optimized out>) at ../../src/gcc/cp/parser.c:9842
    1397476 0x0000000000fd4595 in cp_parser_assignment_expression (parser=0x7fde27558da8, pidk=<optimized out>, cast_p=<optimized out>, decltype_p=<optimized out>) at ../../src/gcc/cp/parser.c:10146
    1397477 0x0000000000fd3b90 in cp_parser_constant_expression (parser=0x7fde27558da8, allow_non_constant_p=2, non_constant_p=0x7ffd1194f1d7, strict_p=<optimized out>) at ../../src/gcc/cp/parser.c:10449
    1397478 0x0000000000fcfdd5 in cp_parser_initializer_clause (non_constant_p=<optimized out>, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:24253
    1397479 cp_parser_initializer (parser=0x7fde27558da8, is_direct_init=<optimized out>, non_constant_p=<optimized out>, subexpression_p=<optimized out>) at ../../src/gcc/cp/parser.c:24193
    1397480 0x000000000062e5d8 in cp_parser_decomposition_declaration (init_loc=0x7ffd1194f1d8, maybe_range_for_decl=0x7ffd1194f498, decl_specifiers=0x7ffd1194f1f0, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:14734
    1397481 cp_parser_simple_declaration (parser=0x7fde27558da8, function_definition_allowed_p=<optimized out>, maybe_range_for_decl=0x7ffd1194f498) at ../../src/gcc/cp/parser.c:14393
    1397482 0x000000000109b870 in cp_parser_init_statement (parser=0x7fde27558da8, decl=0x7ffd1194f498) at ../../src/gcc/cp/parser.c:13420
    1397483 0x00000000010996f0 in cp_parser_for (unroll=0, ivdep=false, parser=0x7fde27558da8) at ../../src/gcc/cp/parser.c:12708
    1397484 cp_parser_iteration_statement (parser=0x7fde27558da8, if_p=0x0, ivdep=<optimized out>, unroll=<optimized out>) at ../../src/gcc/cp/parser.c:13343
    1397485 0x0000000000fe5c46 in cp_parser_statement (parser=0x7fde27558da8, in_statement_expr=0x0, in_compound=<optimized out>, if_p=0x0, chain=0x0, loc_after_labels=0x0) at ../../src/gcc/cp/parser.c:11718
    1397486 0x0000000000fe15ac in cp_parser_statement_seq_opt (in_statement_expr=<optimized out>, parser=<optimized out>) at ../../src/gcc/cp/parser.c:12201
    1397487 cp_parser_compound_statement (parser=0x7fde27558da8, in_statement_expr=0x0, bcs_flags=<optimized out>, function_body=<optimized out>) at ../../src/gcc/cp/parser.c:12150

Interesting frame is 1397471, from which we can extract location:

    (gdb) p line_table[0].info_ordinary.maps[line_table[0].info_ordinary.cache]
    $54 = {
      <line_map> = {
        start_location = 1396581280
      },
      members of line_map_ordinary:
      reason = LC_RENAME,
      sysp = 0 '\000',
      m_column_and_range_bits = 8,
      m_range_bits = 0,
      to_file = 0x3eb4bb0 "/ch/src/Interpreters/MySQL/InterpretersMySQLDDLQuery.cpp",
      to_line = 46,
      included_from = 0
    }

By replicating SOURCE_LINE() macro from gcc-11 (libcpp/include/line-map.h):

    /* Converts a map and a location_t to source line.  */
    inline linenum_type
    SOURCE_LINE (const line_map_ordinary *ord_map, location_t loc)
    {
      return ((loc - ord_map->start_location)
          >> ord_map->m_column_and_range_bits) + ord_map->to_line;
    }

We got line 154:

    (gdb) p ((input_location-1396581280) >> 8) + 46
    $61 = 154

And this is:

    auto [column_name_and_type, declare_column_ast] = std::tuple{columns_name_and_type.begin(), columns_definition->children.begin()};

After rewriting it, everything works correctly.

Also here is a reproducer for gcc-11 (and no failures for gcc-12, but
gcc-12 has other issues, like [1] and one more for hash table):

    # cat /tmp/test.cpp
    #include <tuple>

    auto multi()
    {
            return std::tuple{1, 1};
    }
    double foo()
    {
            auto [a, b] = multi();
            return a - b;
    }

    # g++-11 -std=gnu++20 -c -o /dev/null -isystem /ch/contrib/libcxx/include -nostdinc++ /tmp/test.cpp
    g++-11: internal compiler error: Segmentation fault signal terminated program cc1plus
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <file:///usr/share/doc/gcc-11/README.Bugs> for instructions.
    # g++-12 -std=gnu++20 -c -o /dev/null -isystem /ch/contrib/libcxx/include -nostdinc++ /tmp/test.cpp

  [1]: https://reviews.llvm.org/D122598

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-28 22:21:45 +03:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
mergify[bot]
f09ebea2d8
Merge branch 'master' into read_in_order_max_rows_to_read 2022-03-28 13:30:34 +00:00
Anton Popov
67195bfdd5 support schema inference for type Object in format JSONEachRow 2022-03-25 21:51:53 +00:00
Vladimir C
cfb12aff6f
Merge pull request #35460 from helifu/master 2022-03-25 15:55:17 +01:00
Anton Popov
78100abc5f add parallel parsing and schema inference for type Object 2022-03-24 17:51:35 +00:00
vdimir
f106e2dd49
fix style in QueryPlan.cpp 2022-03-24 11:53:58 +00:00
vdimir
d16ae46589
remove description for ReadFromMergeTree from pipeline, adjust tests for plan 2022-03-24 11:31:52 +00:00
helifu
8a5bd2defa
Add explicit table info to the scan node of query plan and pipeline
:) explain plan select * from table1 t1 left join table2 t2 on t1.name = t2.name;
┌─explain──────────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                                  │
│   Join (JOIN)                                                                                │
│     Expression (Before JOIN)                                                                 │
│       SettingQuotaAndLimits (Set limits and quota after reading from storage)                │
│         ReadFromMergeTree (default.table1)                                                   │
│     Expression ((Joined actions + (Rename joined columns + (Projection + Before ORDER BY)))) │
│       SettingQuotaAndLimits (Set limits and quota after reading from storage)                │
│         ReadFromMergeTree (default.table2)                                                   │
└──────────────────────────────────────────────────────────────────────────────────────────────┘

:) explain pipeline select * from table1 t1 left join table2 t2 on t1.name = t2.name;
┌─explain──────────────────────────────────────────┐
│ (Expression)                                     │
│ ExpressionTransform × 24                         │
│   (Join)                                         │
│   JoiningTransform × 24 2 → 1                    │
│     Resize 1 → 24                                │
│       FillingRightJoinSide                       │
│         Resize 24 → 1                            │
│           (Expression)                           │
│           ExpressionTransform × 24               │
│             (SettingQuotaAndLimits)              │
│               (ReadFromMergeTree default.table1) │
│               MergeTreeThread × 24 0 → 1         │
│           (Expression)                           │
│           ExpressionTransform × 24               │
│             (SettingQuotaAndLimits)              │
│               (ReadFromMergeTree default.table2) │
│               MergeTreeThread × 24 0 → 1         │
└──────────────────────────────────────────────────┘
2022-03-24 10:49:12 +00:00
mergify[bot]
bf90edc362
Merge branch 'master' into case-insensitive-column-matching 2022-03-24 08:00:42 +00:00
Kruglov Pavel
826b933b08
Merge pull request #35332 from Avogar/fix-tskv-schema-inference
Fix schema inference for TSKV format while using small max_read_buffer_size
2022-03-23 18:37:07 +01:00
Antonio Andelic
052057f2ef Address PR comments 2022-03-23 15:42:46 +00:00
Anton Popov
f693eba568 fix tests with approx rows 2022-03-22 14:30:40 +00:00