Commit Graph

3460 Commits

Author SHA1 Message Date
zhenjial
0f788d98f5 new implementation 2022-09-06 20:39:54 +08:00
zhenjial
18db90dcfc Record errors while reading text formats (CSV, TSV). 2022-09-06 17:19:15 +08:00
Duc Canh Le
6950016b8a fix grouping set with group_by_use_nulls 2022-09-06 09:39:27 +08:00
zhongyuankai
0bf76fe642 Merge branch 'compress-marks' into compress_marks_and_primary_key 2022-09-06 08:01:43 +08:00
Igor Nikonov
7a8b8e7a39 Optimizer setting: read in window order
optimization's setting is checked before applying it, not inside the optimization code
2022-09-05 20:47:11 +00:00
Igor Nikonov
30860290de Continue 2022-09-05 20:12:00 +00:00
kssenii
83514fa2ef Refactor 2022-09-05 20:08:22 +02:00
Alexey Milovidov
d7127e4b2d Make it slightly more sane 2022-09-05 07:26:58 +02:00
Igor Nikonov
8fece1e2d2
Merge branch 'master' into sort_mode_rename 2022-09-04 21:44:33 +02:00
Alexey Milovidov
193cd1b3b2
Merge pull request #39138 from nickitat/control_block_size_in_aggregator
Control block size in aggregator
2022-09-04 04:51:00 +03:00
Igor Nikonov
70f779b81d Just to save 2022-09-02 21:24:33 +00:00
Igor Nikonov
5d7fa55f36
Merge branch 'master' into sort_mode_rename 2022-09-02 23:19:04 +02:00
Kruglov Pavel
77071381e4
fix build 2022-09-02 16:37:33 +02:00
Vladimir C
963c0111bf
Merge pull request #39418 from vdimir/join_and_sets
Filter joined streams for `full_sorting_join` by each other before sorting
2022-09-02 13:57:06 +02:00
Antonio Andelic
e64436fef3 Fix typos with new codespell 2022-09-02 08:54:48 +00:00
Robert Schulze
319d8b00a7
Merge pull request #39010 from FrankChen021/tracing_context_propagation
Improve the opentelemetry tracing context propagation across threads
2022-09-02 07:56:43 +02:00
Robert Schulze
c7c00f9002
Merge pull request #40739 from ClickHouse/clang-tidy-for-headers
Enable clang-tidy for headers
2022-09-02 07:54:50 +02:00
avogar
afc34dca41 Add new JSON formats, add improvements and refactoring 2022-09-01 19:00:24 +00:00
Kruglov Pavel
7a4a65bc36
Make better exception message in schema inference 2022-09-01 20:36:08 +02:00
Kruglov Pavel
f53aa86a20
Merge pull request #40485 from arthurpassos/fix-parquet-chunked-array-deserialization
Add support for extended (chunked) arrays for Parquet format
2022-09-01 19:40:40 +02:00
Dmitry Novik
ddadb362cf
Merge pull request #39762 from quickhouse/betterorderbyoptimization
Fixed `Unknown identifier (aggregate-function)` exception which appears when a user tries to calculate WINDOW ORDER BY/PARTITION BY expressions over aggregate functions
2022-09-01 18:08:06 +02:00
Frank Chen
9d63cbe811 Merge 'origin/master' into tracing_context_propagation to resolve conflicts 2022-09-01 23:18:59 +08:00
Vladimir C
12e6fc4182
Merge branch 'master' into join_and_sets 2022-09-01 14:56:14 +02:00
Kseniia Sumarokova
c6c67a248d
Merge pull request #40792 from canhld94/ch_canh_intersect_distinct
Implement intersect + except distinct
2022-09-01 14:35:26 +02:00
Anton Popov
f7bdf07adc
Merge pull request #38715 from CurtizJ/fix-read-in-order-fixed-prefix
Better support of `optimize_read_in_order` in case of fixed prefix of sorting key
2022-09-01 12:59:18 +02:00
Robert Schulze
de64c6b103
Merge branch 'master' into clang-tidy-for-headers 2022-09-01 10:24:56 +02:00
Kruglov Pavel
86516d3bb4
Merge pull request #40740 from amosbird/row-policy-index-fix-1
Use index when row_policy_filter is always false
2022-08-31 18:46:14 +02:00
Robert Schulze
cedf75ed5e
Enable clang-tidy for headers
clang-tidy now also checks code in header files. Because the analyzer
finds tons of issues, activate the check only for directory "base/" (see
file ".clang-tidy"). All other directories, in particular "src/" are
left to future work.

While many findings were fixed, some were not (and suppressed instead).
Reasons for this include: a) the file is 1:1 copypaste of a 3rd-party
lib (e.g. pcg_extras.h) and fixing stuff would make upgrades/fixes more
difficult b) a fix would have broken lots of using code
2022-08-31 10:48:15 +00:00
Anton Popov
3504781529
Merge branch 'master' into fix-read-in-order-fixed-prefix 2022-08-30 23:32:43 +02:00
Dmitry Novik
0a8378d9cd
Merge branch 'master' into betterorderbyoptimization 2022-08-30 14:23:22 +02:00
vdimir
0f6f3c73b0
Minor fix 2022-08-30 11:57:28 +00:00
Duc Canh Le
8590cc46c4 implement intersect + except distinct 2022-08-30 18:09:01 +08:00
Frank Chen
f17d56b528 Merge branch 'master' into tracing_context_propagation 2022-08-30 14:24:36 +08:00
vdimir
24f62e8486
Throw an error in CreatingSetsOnTheFlyTransform in case of input for finished 2022-08-29 11:27:08 +00:00
vdimir
b0e2616aa9
Style fixes in CreateSetAndFilterOnTheFlyTransform and related 2022-08-29 11:26:21 +00:00
Anton Popov
2a3e012931
Merge branch 'master' into fix-read-in-order-fixed-prefix 2022-08-29 13:17:26 +02:00
vdimir
7915b6948f
Fix build after rebase 2022-08-29 09:49:16 +00:00
vdimir
afb6b7d9cf
Test plan and pipeline for filtering step for join 2022-08-29 09:49:15 +00:00
vdimir
afeff512b5
Aux port for ReadHeadBalancedProcessor is empty Block 2022-08-29 09:49:14 +00:00
vdimir
95f87dc34e
fix sanitizer assert in CreateSetAndFilterOnTheFlyStep 2022-08-29 09:49:12 +00:00
vdimir
c67ab33d90
small fix CreateSetAndFilterOnTheFlyStep 2022-08-29 09:49:11 +00:00
vdimir
51e02d09f6
set preserves_sorting = true for CreateSetAndFilterOnTheFlyStep 2022-08-29 09:49:10 +00:00
vdimir
714c53ab24
fix typos 2022-08-29 09:49:09 +00:00
vdimir
8e1632f824
Create sets for joins: better code 2022-08-29 09:49:08 +00:00
vdimir
7228091ff1
rename CreateSetAndFilterOnTheFlyTransform 2022-08-29 09:49:07 +00:00
vdimir
67a9acc8db
rename CreatingSetOnTheFlyStep -> CreateSetAndFilterOnTheFlyStep 2022-08-29 09:49:07 +00:00
vdimir
d82a75ae75
cleanup PingPongProcessor 2022-08-29 09:49:06 +00:00
vdimir
e472e13c70
move PingPongProcessor/ReadHeadBalancedProceesor into separate file 2022-08-29 09:49:05 +00:00
vdimir
51a51694d6
Create sets for joins: better code 2022-08-29 09:49:01 +00:00
vdimir
c778bba13f
Create sets for joins: wip 2022-08-29 09:47:00 +00:00
vdimir
31a167848d
Fix set finish condition in CreatingSetsOnTheFlyTransform 2022-08-29 09:46:59 +00:00
vdimir
71708d595f
Create sets for joins: wip 2022-08-29 09:46:59 +00:00
vdimir
8f06430ebd
Create sets for joins: upd 2022-08-29 09:46:58 +00:00
vdimir
3292566603
Format bytes in CreatingSetsOnTheFlyTransform logs 2022-08-29 09:46:57 +00:00
vdimir
031aaf3a45
Add Creating/FilterBySetsOnTheFlyStep for full sorting join 2022-08-29 09:46:57 +00:00
vdimir
c5bc7b0a0c
Resize pipeline after full sort join 2022-08-29 09:46:56 +00:00
Azat Khuzhin
f9812d9917 Fix memory leak while pushing to MVs w/o query context (from Kafka/...)
While pushign to MVs, there is a low-level code that create
ThreadGroupStatus/ThreadStatus, it is required to gather some metrics
for system.query_views_log.

But, one should not use ThreadGroupStatus of the MainThreadStatus, since
this structure can hold some state, that may not be cleaned, plus this
may be racy, instead it is better to create new ThreadGroupStatus and
attach it instead.

Also this place misses detachQuery(), and because of this it leaks
ThreadGroupStatus::finished_threads_counters_memory. But it is only the
problem pushing to MVs is done w/o query context (i.e. from Kafka/...),
since when it has query context detachQuery() will be called eventually.

Before this patch series, when I've tried the reproducer with
500 MVs attached to Kafka engine (that @den-crane suggested), jemalloc
report looks like this:

    $ ../jeprof --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7  99.5%  99.5%    910.7  99.5% Snapshot (inline)
         9.5   1.0% 100.5%      9.5   1.0% std::__1::__libcpp_operator_new (inline)
         0.5   0.1% 100.6%      0.5   0.1% DB::TasksStatsCounters::create

And with focus to this place:

    $ ../jeprof --focus Snapshot --text ~/ch/tmp/upstream/clickhouse-binary --base jeprof.44384.0.i0.heap jeprof.44384.167.i167.heap
    Using local file /home/azat/ch/tmp/upstream/clickhouse-binary.
    Using local file jeprof.44384.167.i167.heap.
    Total: 915.6 MB
       910.7 100.0% 100.0%    910.7 100.0% Snapshot (inline)
         0.0   0.0% 100.0%    910.7 100.0% DB::QueryPipeline::reset
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::streamToViews
         0.0   0.0% 100.0%    910.7 100.0% DB::StorageKafka::threadFunc
         0.0   0.0% 100.0%    910.7 100.0% ProfileEvents::Counters::getPartiallyAtomicSnapshot
         0.0   0.0% 100.0%    910.7 100.0% ~ThreadStatus
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeData
         0.0   0.0% 100.0%    910.7 100.0% ~ViewRuntimeStats (inline)

Actually this report does not looks great (you understand it because I
stripped it), because --text does not that smart, but if you will use
--pdf for the report you will see the stacktrace (will attach pdf to the
pull request).

But after this patch series the process RSS does not goes beyond
~700MiB.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-08-29 11:36:33 +02:00
Amos Bird
15a69bce84
Use index when row_policy_filter is always false 2022-08-29 16:44:32 +08:00
Alexey Milovidov
365a600fdb Merge branch 'force-documentation-3' of github.com:ClickHouse/ClickHouse into force-documentation-3 2022-08-27 22:28:54 +02:00
Alexey Milovidov
6b2e227c8b Fix integration test 2022-08-27 22:28:38 +02:00
Vladimir C
e067629e0d
Merge pull request #40239 from vdimir/vdimir/tmp-file-metrics
More metrics for on-disk temporary data
2022-08-26 11:28:01 +02:00
Alexander Gololobov
6a69e08799
Merge pull request #40559 from ClickHouse/lwd_vertical_merge_fix
Fix vertical merge of parts with lightweight deleted rows
2022-08-25 20:47:44 +02:00
Frank Chen
bb00dcc19b Remove using namespace from header
Signed-off-by: Frank Chen <frank.chen021@outlook.com>
2022-08-25 20:20:13 +08:00
Frank Chen
99c37ce6c6
Merge branch 'master' into tracing_context_propagation 2022-08-25 10:07:16 +08:00
Nikita Taranov
ac34a17551
Merge branch 'master' into control_block_size_in_aggregator 2022-08-24 20:25:28 +02:00
vdimir
91788f29e8
Upd TemporaryFileOnDisk 2022-08-24 16:15:54 +00:00
vdimir
7194df1184
Move back TemporaryFile -> TemporaryFileOnDisk 2022-08-24 16:14:11 +00:00
vdimir
0349c85017
Use getCompressedBytes in BufferingToFileTransform and TemporaryFileStream 2022-08-24 16:14:10 +00:00
vdimir
51c44424cc
More metrics for temp files 2022-08-24 16:14:09 +00:00
vdimir
1321ac87b5
Minor fixes 2022-08-24 16:14:07 +00:00
vdimir
7e0c9062c7
Add ProfileEvents::ExternalSort(Un)CompressedBytes 2022-08-24 16:14:07 +00:00
Kruglov Pavel
e6e7f5db93
Merge pull request #40491 from mini4/fix-settings-input_format_tsv_skip_first_lines
Fix bug in settings input_format_tsv_skip_first_lines of format TSV
2022-08-24 15:57:45 +02:00
Alexander Gololobov
1c2dd50ca5 Fix vertical merge of parts with lightweight deleted rows 2022-08-24 15:18:33 +02:00
Kruglov Pavel
0781e8b4f7
Merge pull request #40534 from Avogar/nested-in-avro
Support reading Array(Record) into flatten nested table in Avro
2022-08-24 13:33:12 +02:00
Frank Chen
cd19366b44 Move classes into DB::OpenTelemetry namespace 2022-08-24 16:41:40 +08:00
kgurjev
f62c2c3221 Fix bug in settings input_format_tsv_skip_first_lines of format TSV 2022-08-24 10:02:57 +03:00
Kruglov Pavel
72f02bd6eb
Merge pull request #40414 from Avogar/improve-schema-inference-cache
Improve schema inference cache, respect format settings that can change the schema
2022-08-23 17:04:58 +02:00
avogar
29a887578b Fix 2022-08-23 11:42:57 +00:00
avogar
581e569d04 Support reading Array(Record) into flatten nested table in Avro 2022-08-23 11:05:02 +00:00
Arthur Passos
f8e2ab0a20 Use FileReader::GetRecordBatchReader instead of FileReader::ReadRowGroup to parse Parquet 2022-08-22 08:21:32 -03:00
Alexey Milovidov
ab91c99495
Merge branch 'master' into control_block_size_in_aggregator 2022-08-20 21:28:27 +03:00
Alexey Milovidov
74e1f4dc61 Fix clang-tidy 2022-08-20 17:09:20 +02:00
avogar
612ffaffde Make schema inference cache better, respect format settings that can change the schema 2022-08-19 16:39:13 +00:00
Nikita Taranov
1b6e7b9ca2
Merge branch 'master' into sort_mode_rename 2022-08-19 12:31:59 +02:00
Kruglov Pavel
b67cb9e378
Merge pull request #40173 from Avogar/arrow-dict
Improve and fix dictionaries in Arrow format
2022-08-18 20:54:55 +02:00
Kruglov Pavel
09a2ff8843
Merge pull request #40293 from joshuataylor/feature/arrow-large-binary-string
Add support for LARGE_BINARY/LARGE_STRING with Arrow
2022-08-18 14:01:58 +02:00
avogar
a6318cecd5 Fix hive test 2022-08-18 11:32:42 +00:00
Nikolai Kochetov
5a85531ef7
Merge pull request #38286 from Avogar/schema-inference-cache
Add schema inference cache for s3/hdfs/file/url
2022-08-18 13:07:50 +02:00
Kruglov Pavel
d7056376eb
Merge pull request #40068 from Avogar/schema-inference-hints
Allow to specify structure hints in schema inference
2022-08-18 12:19:45 +02:00
Igor Nikonov
6fe8b61345
Merge branch 'master' into sort_mode_rename 2022-08-17 19:19:29 +02:00
Yakov Olkhovskiy
40fd6e189a
call readColumnWithStringData 2022-08-17 09:54:01 -04:00
Kruglov Pavel
19af748737
Fix typo 2022-08-17 14:29:09 +02:00
Kruglov Pavel
00d04456ff
Try reduce code duplication 2022-08-17 14:28:15 +02:00
Vladimir C
b876cc17c9
Merge pull request #39593 from quickhouse/fixexponentialdecaywindowfunctions
Fixed point of origin for exponential decay window functions to the last value in window
2022-08-17 14:19:59 +02:00
Igor Nikonov
5ceaeb9e12 Sorting mode renaming
+ sort mode -> sort scope
+ Stream -> Global
+ Port -> Stream
2022-08-17 12:19:36 +00:00
avogar
8dd54c043d Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-cache 2022-08-17 11:47:40 +00:00
Igor Nikonov
46ed4f6cdf
Merge pull request #38719 from ClickHouse/skipping_sorting_step
SortingStep: deduce way to sort based on input stream sort description
2022-08-17 12:58:11 +02:00
Josh Taylor
628d2bbff5 Add support for LARGE_BINARY/LARGE_STRING with Arrow 2022-08-17 10:25:06 +08:00
Nikita Taranov
6bdbaccc37 use max_block_size from settings 2022-08-16 18:56:22 +02:00
Nikita Taranov
63bc894a42 more parallelism 2022-08-16 18:56:22 +02:00
Nikita Taranov
f650b23ee3 generate many blocks 2022-08-16 18:56:22 +02:00
Nikita Taranov
db0110fd7a more accurate crutch 2022-08-16 18:56:22 +02:00
Nikita Taranov
e5e0a24ab3 return chunks from prepareBlockAndFillWithoutKey 2022-08-16 18:56:22 +02:00
Igor Nikonov
d4367de7bb Rename setting to optimize_sorting_by_input_stream_properties 2022-08-16 16:27:41 +00:00
Vladimir Chebotaryov
3cc03b141e Fixed tests on Debug build type. 2022-08-16 15:43:37 +02:00
Vladimir Chebotaryov
66f9bfca61 Fixed point of origin for exponential decay window functions to the last value in window. 2022-08-16 15:43:37 +02:00
avogar
99d8727335 Fix tests 2022-08-16 12:56:51 +00:00
avogar
936c457734 Remove unnended field 2022-08-16 09:51:52 +00:00
avogar
e1ff996ec3 Allow to specify structure hints in schema inference 2022-08-16 09:46:57 +00:00
Maksim Kita
110470809b
Merge pull request #40121 from amosbird/profile-processor-1
Extend processors_profile_log with more info
2022-08-16 09:49:12 +02:00
Igor Nikonov
aba00952f5 Fix: don't set sort mode in ReadFromMergeTree if sort description empty 2022-08-15 20:58:20 +00:00
Kruglov Pavel
2c5c0d6d47
Fix typo 2022-08-15 19:55:28 +02:00
avogar
ca0d883c0f Fix possible segfault in CapnProto input format 2022-08-15 15:36:18 +00:00
Igor Nikonov
ea10fd65b8 Sorting properties in EXPLAIN PLAN
~ change formatting for sorting
~ rename sortmode option -> sorting
2022-08-15 15:14:59 +00:00
avogar
c160033837 Fix 2022-08-15 11:38:28 +00:00
Igor Nikonov
d83bea626c Merge remote-tracking branch 'origin/master' into skipping_sorting_step 2022-08-13 21:46:34 +00:00
Igor Nikonov
f33a0d8c85 More simple way to check if sorting order is preserved
- there is a case where it's done wrong
2022-08-12 23:42:37 +00:00
avogar
78e197063c Better example 2022-08-12 19:08:36 +00:00
avogar
763f84b623 Remove bad comment 2022-08-12 19:05:57 +00:00
avogar
9addded80e Remove logging 2022-08-12 19:01:02 +00:00
avogar
000336622a Remove logging 2022-08-12 18:59:52 +00:00
avogar
398576e9c9 Improve and fix dictionaries in Arrow format 2022-08-12 18:56:21 +00:00
Kseniia Sumarokova
a6cfc7bc3b
Merge pull request #34651 from alexX512/master
New caching strategies
2022-08-12 17:23:37 +02:00
Anton Popov
4bd50bb06c
Merge branch 'master' into distinct_sorted_simplify 2022-08-12 17:11:18 +02:00
Kruglov Pavel
4c7222d938
Merge pull request #40020 from canhld94/ch_canh_fix_hash
fix HashMethodOneNumber with const column
2022-08-12 14:40:24 +02:00
Amos Bird
99a38e41aa
processor profile 2022-08-11 21:03:34 +08:00
Igor Nikonov
75f6fcfa70 Merge remote-tracking branch 'origin/master' into skipping_sorting_step 2022-08-11 12:35:55 +00:00
Amos Bird
fa8fab2e8f
Fix KeyCondition with other filters 2022-08-11 19:20:44 +08:00
Maksim Kita
6bec0f5854
Merge pull request #38956 from vdimir/dict-join-refactoring
Join with dictionary refactoring
2022-08-11 11:54:11 +02:00
Vladimir C
2d44e6c458
Merge pull request #39343 from vdimir/refactor-prepared-sets
Refactor PreparedSets/SubqueryForSet
2022-08-11 11:19:18 +02:00
Vladimir Chebotaryov
748979a9c0
Merge branch 'master' into betterorderbyoptimization 2022-08-11 11:09:52 +03:00
Duc Canh Le
84cd867aa8 materialize column instead of handling column in hash method 2022-08-11 10:46:06 +08:00
Anton Popov
3fdf428834
Merge pull request #39186 from Avogar/numbers-schema-inference
Add new features in schema inference
2022-08-11 00:53:54 +02:00
vdimir
ad91c16ba0
Rename join_common -> JoinUtils 2022-08-10 14:20:28 +00:00
vdimir
b7c5c54181
Fix build 2022-08-10 13:43:55 +00:00
vdimir
5eb4cd39e0
Merge branch 'master' into refactor-prepared-sets 2022-08-10 11:47:49 +00:00
Maksim Kita
aff8149f5c
Merge pull request #39998 from kitaisreal/actions-dag-refactoring
ActionsDAG rename index to outputs
2022-08-10 11:44:18 +02:00
Igor Nikonov
754a9fb096 Merge remote-tracking branch 'origin/master' into skipping_sorting_step 2022-08-09 22:20:17 +00:00
Arthur Passos
c4d8ad2222 Add docs 2022-08-09 15:58:46 -03:00
Arthur Passos
e724e7bef6 Update arrow dict to lc comment 2022-08-09 15:52:37 -03:00
Arthur Passos
6eb89fd780 Fix both arrow dict de-serialization and dict of nullable de-serialization 2022-08-09 15:06:22 -03:00
Arthur Passos
be1e32c3f1
Merge branch 'ClickHouse:master' into fix_arrow_column_dictionary_to_ch_lc 2022-08-09 15:04:06 -03:00
Maksim Kita
acbfcf440b
Merge branch 'master' into actions-dag-refactoring 2022-08-09 18:52:08 +02:00
Igor Nikonov
70b52f7cb9 Fix test, review comments 2022-08-09 16:29:56 +00:00
Maksim Kita
a576a55375 Fixed build 2022-08-09 15:03:59 +02:00
Kruglov Pavel
088e8cf9bd
Merge branch 'master' into numbers-schema-inference 2022-08-09 14:00:36 +02:00
Kruglov Pavel
99b9e85a8f
Merge pull request #39646 from Avogar/more-formats
Add more Pretty formats
2022-08-09 13:59:47 +02:00
Igor Nikonov
366ead3828 Consider aliases when checking if sorting order is preserved by
expression
2022-08-09 11:27:17 +00:00
Igor Nikonov
1439664df6 EXPLAIN tests 2022-08-08 20:46:43 +00:00
Maksim Kita
c030fd05e7 ActionsDAG rename index to outputs 2022-08-08 18:01:32 +02:00