Commit Graph

1066 Commits

Author SHA1 Message Date
alexey-milovidov
fde8c87a1f
Merge pull request #12426 from ClickHouse/log-engine-rollback-on-insert-error
Rollback insertion error in Log engines
2020-07-16 22:50:48 +03:00
Anton Popov
97e8a88b30
Merge pull request #12277 from bobrik/ivan/exact-range-speedup
WIP: Optimize PK lookup for queries that match exact PK range
2020-07-16 19:17:50 +03:00
Vitaly Baranov
000b197ad1
Merge pull request #11234 from traceon/ldap-per-user-authentication
Add LDAP authentication support
2020-07-16 13:17:21 +03:00
Alexey Milovidov
6df282e813 Fixup 2020-07-16 11:33:51 +03:00
alexey-milovidov
8966c09ed6
Merge pull request #12519 from vzakaznikov/fix_data_duplication_and_tests_for_live_view
Fixing race condition in live view tables which could cause data duplication and live view tests
2020-07-16 11:03:28 +03:00
Alexey Milovidov
68f9fd3767 Debug tests 2020-07-16 06:02:20 +03:00
Alexey Milovidov
82ea884d01 Fix incorrect unit test 2020-07-16 05:37:12 +03:00
Alexey Milovidov
3408b7e259 Merge branch 'master' into log-engine-rollback-on-insert-error 2020-07-16 05:34:02 +03:00
Denis Glazachev
59cb758cf7 Merge branch 'master' into ldap-per-user-authentication 2020-07-16 02:29:24 +04:00
Alexey Milovidov
e1e2204279 Whitespace 2020-07-15 19:37:52 +03:00
Vitaliy Zakaznikov
370dd3396b Fixing clang build. 2020-07-15 16:18:53 +02:00
Vitaliy Zakaznikov
560151f6cd * Fix bug in StorageLiveView.cpp
* Fixing synchronization of the first insert in live view tests
2020-07-15 13:24:33 +02:00
alesapin
614540eddf
Merge pull request #12382 from ClickHouse/clear-all-columns
Better errors for CLEAR/DROP columns (possibly in partitions)
2020-07-15 12:52:06 +03:00
alexey-milovidov
9c68124110
Merge pull request #12302 from azat/kafka-error-in-the-batch-SIGSEGV
kafka: fix SIGSEGV if there is a message with error in the middle of the batch
2020-07-15 05:20:26 +03:00
alesapin
9e41fbca55 Remove check for drop detached partition 2020-07-14 16:56:30 +03:00
Alexander Kuzmenkov
b515dd5b83 Merge remote-tracking branch 'origin/master' into HEAD 2020-07-14 15:40:27 +03:00
Alexander Kuzmenkov
b24f727aea typo 2020-07-14 15:40:18 +03:00
alesapin
014bb070ec Fix tests 2020-07-14 11:19:39 +03:00
alexey-milovidov
fd4adf27d6
Merge pull request #12456 from CurtizJ/fix-12437
Fix #12437
2020-07-14 09:28:31 +03:00
alexey-milovidov
1893d89ce3
Merge pull request #12448 from ClickHouse/fix-trash-rabbitmq
Fix trash from RabbitMQ
2020-07-14 01:11:37 +03:00
alesapin
1f576ee039 Some intermediate solution 2020-07-13 20:27:52 +03:00
Alexey Milovidov
cb46bca157 Merge branch 'master' into fix-trash-rabbitmq 2020-07-13 19:51:17 +03:00
alesapin
4a53264a86 Remove redundant and duplicated code 2020-07-13 19:19:08 +03:00
robot-clickhouse
0f23642a3d Auto version update to [20.7.1.1] [54437] 2020-07-13 18:26:03 +03:00
Alexander Kuzmenkov
d6e7ab5988 Fuzzing-related fixes 2020-07-13 16:58:48 +03:00
Denis Glazachev
f787702922 Merge branch 'master' into ldap-per-user-authentication
* master: (27 commits)
  Whitespaces
  Fix typo
  Fix UBSan report in base64
  Correct default secure port for clickhouse-benchmark #11044
  Remove test with bug #10697
  Update in-functions.md (#12430)
  Allow nullable key in MergeTree
  Update arithmetic-functions.md
  [docs] add rabbitmq docs (#12326)
  Lower block sizes and look what will happen #9248
  Fix lifetime_bytes/lifetime_rows for Buffer direct block write
  Retrigger CI
  Fix up  test_mysql_protocol failed
  Implement lifetime_rows/lifetime_bytes for Buffer engine
  Add comment regarding proxy tunnel usage in PocoHTTPClient.cpp
  Add lifetime_rows/lifetime_bytes interface (exported via system.tables)
  Tiny IStorage refactoring
  Trigger integration-test-runner image rebuild.
  Delete log.txt
  Fix test_mysql_client/test_python_client error
  ...
2020-07-13 15:46:27 +04:00
Anton Popov
a9530d2883 in-memory parts: fix reading from nested 2020-07-13 12:10:55 +03:00
alexey-milovidov
ae7eff98ed
Merge pull request #12433 from amosbird/np
Allow nullable key in MergeTree
2020-07-13 04:36:00 +03:00
Alexey Milovidov
8f2055b0a0 Fix trash from RabbitMQ 2020-07-13 04:11:48 +03:00
Amos Bird
cac5a89169
Allow nullable key in MergeTree 2020-07-12 22:21:51 +08:00
Alexey Milovidov
49f60ef3a4 Fix build 2020-07-12 08:26:33 +03:00
Alexey Milovidov
204a4af394 Rollback insertion error in Log engines #12402 2020-07-12 05:32:18 +03:00
Ivan Babrou
8784994d65 Allow conditions outside of PK with exact range
Conditions that are outside of PK are marked as `unknown` in `KeyCondition`,
so it's safe to allow them, as long as they are always combined by `AND`.
2020-07-11 18:59:26 -07:00
Azat Khuzhin
3bee98c6f0 Fix lifetime_bytes/lifetime_rows for Buffer direct block write 2020-07-12 01:16:05 +03:00
Ivan Babrou
d9d8d0242e Optimize PK lookup for queries that match exact PK range
Existing code that looks up marks that match the query has a pathological
case, when most of the part does in fact match the query.

The code works by recursively splitting a part into ranges and then discarding
the ranges that definitely do not match the query, based on primary key.

The problem is that it requires visiting every mark that matches the query,
making the complexity of this sort of look up O(n).

For queries that match exact range on the primary key, we can find
both left and right parts of the range with O(log 2) complexity.

This change implements exactly that.

To engage this optimization, the query must:

* Have a prefix list of the primary key.
* Have only range or single set element constraints for columns.
* Have only AND as a boolean operator.

Consider a table with `(service, timestamp)` as the primary key.

The following conditions will be optimized:

* `service = 'foo'`
* `service = 'foo' and timestamp >= now() - 3600`
* `service in ('foo')`
* `service in ('foo') and timestamp >= now() - 3600 and timestamp <= now`

The following will fall back to previous lookup algorithm:

* `timestamp >= now() - 3600`
* `service in ('foo', 'bar') and timestamp >= now() - 3600`
* `service = 'foo'`

Note that the optimization won't engage when PK has a range expression
followed by a point expression, since in that case the range is not continuous.

Trace query logging provides the following messages types of messages,
each representing a different kind of PK usage for a part:

```
Used optimized inclusion search over index for part 20200711_5710108_5710108_0 with 9 steps
Used generic exclusion search over index for part 20200711_5710118_5710228_5 with 1495 steps
Not using index on part 20200710_5710473_5710473_0
```

Number of steps translates to computational complexity.

Here's a comparison for before and after for a query over 24h of data:

```
Read 4562944 rows, 148.05 MiB in 45.19249672 sec.,   100966 rows/sec.,   3.28 MiB/sec.
Read 4183040 rows, 135.78 MiB in 0.196279627 sec., 21311636 rows/sec., 691.75 MiB/sec.
```

This is especially useful for queries that read data in order
and terminate early to return "last X things" matching a query.

See #11564 for more thoughts on this.
2020-07-11 12:26:54 -07:00
Denis Glazachev
edb6ef8c09 Merge commit 'ceac649c01b0158090cd271776f3219f5e7ff57c' into ldap-per-user-authentication
* commit 'ceac649c01b0158090cd271776f3219f5e7ff57c': (75 commits)
  [docs] split misc statements (#12403)
  Update 00405_pretty_formats.reference
  Update PrettyCompactBlockOutputFormat.cpp
  Update PrettyBlockOutputFormat.cpp
  Update DataTypeNullable.cpp
  Update 01383_remote_ambiguous_column_shard.sql
  add output_format_pretty_grid_charset setting in docs
  add setting output_format_pretty_grid_charset
  Added a test for #11135
  Update index.md
  RIGHT and FULL JOIN for MergeJoin (#12118)
  Update MergeTreeIndexFullText.cpp
  restart the tests
  [docs] add syntax highlight (#12398)
  query fuzzer
  Fix std::bad_typeid when JSON functions called with argument of wrong type.
  Allow typeid_cast() to cast nullptr to nullptr.
  fix another context-related segfault
  [security docs] actually, only admins can create advisories
  query fuzzer
  ...
2020-07-11 21:32:36 +04:00
Azat Khuzhin
32a45d0dee Implement lifetime_rows/lifetime_bytes for Buffer engine
Buffer engine is usually used on INSERTs, but right now there is no way
to track number of INSERTed rows per-table, since only summary metrics
exists:
- StorageBufferRows
- StorageBufferBytes

But it can be pretty useful to track INSERTed rows rate (and it can be
exposed via http_handlers for i.e. prometheus)
2020-07-11 16:06:11 +03:00
Azat Khuzhin
433fdffc19 Add lifetime_rows/lifetime_bytes interface (exported via system.tables) 2020-07-11 15:33:11 +03:00
Azat Khuzhin
84c93a6b02 Tiny IStorage refactoring 2020-07-11 15:17:06 +03:00
alexey-milovidov
e22547c29d
Merge pull request #12388 from ClickHouse/bloom-filter-arg-check
Check arguments of bloom filter index
2020-07-10 20:54:16 +03:00
alexey-milovidov
caef1d8e24
Update MergeTreeIndexFullText.cpp 2020-07-10 20:53:58 +03:00
alexey-milovidov
d819624d7c
Merge pull request #12378 from ClickHouse/allow-clear-column-with-dependencies
Allow to CLEAR column even if there are depending DEFAULT expressions
2020-07-10 20:18:14 +03:00
alexey-milovidov
031c773260
Merge pull request #12384 from ClickHouse/support-negative-float-constants-in-key-condition
Avoid exception when negative or floating point constant is used in WHERE condition for indexed tables
2020-07-10 20:16:35 +03:00
Azat Khuzhin
610382b693 kafka: fix SIGSEGV if there is an message with error in the middle of the batch
ReadBufferFromKafkaConsumer does not handle the case when there is
message with an error on non first position in the current batch, since
it goes through messages in the batch after poll and stop on first valid
message.

But later it can try to use message as valid:
- while storing offset
- get topic name
- ...

And besides the message itself is also invalid (you can find this in the
gdb traces below).

So just filter out messages win an error error after poll.

SIGSEGV was with the following stacktrace:
    (gdb) bt
    3  0x0000000010f05b4d in rd_kafka_offset_store (app_rkt=0x0, partition=0, offset=0) at ../contrib/librdkafka/src/rdkafka_offset.c:656
    4  0x0000000010e69657 in cppkafka::Consumer::store_offset (this=0x7f2015210820, msg=...) at ../contrib/cppkafka/include/cppkafka/message.h:225
    5  0x000000000e68f208 in DB::ReadBufferFromKafkaConsumer::storeLastReadMessageOffset (this=0x7f206a136618) at ../contrib/libcxx/include/iterator:1508
    6  0x000000000e68b207 in DB::KafkaBlockInputStream::readImpl (this=0x7f202c689020) at ../src/Storages/Kafka/KafkaBlockInputStream.cpp:150
    7  0x000000000dd1178d in DB::IBlockInputStream::read (this=this@entry=0x7f202c689020) at ../src/DataStreams/IBlockInputStream.cpp:60
    8  0x000000000dd34c0a in DB::copyDataImpl<> () at ../src/DataStreams/copyData.cpp:21
    9  DB::copyData () at ../src/DataStreams/copyData.cpp:62
    10 0x000000000e67c8f2 in DB::StorageKafka::streamToViews () at ../contrib/libcxx/include/memory:3823
    11 0x000000000e67d218 in DB::StorageKafka::threadFunc () at ../src/Storages/Kafka/StorageKafka.cpp:488

And some information from it:

    (gdb) p this.current.__i
    $14 = (std::__1::__wrap_iter<cppkafka::Message const*>::iterator_type) 0x7f1ca8f58660

    # current-1
    (gdb) p $14-1
    $15 = (const cppkafka::Message *) 0x7f1ca8f58600
    (gdb) p $16.handle_
    $17 = {__ptr_ = {<std::__1::__compressed_pair_elem<rd_kafka_message_s*, 0, false>> = { __value_ = 0x7f203577f938}, ...}
    (gdb) p *(rd_kafka_message_s*)0x7f203577f938
    $24 = {err = RD_KAFKA_RESP_ERR__TRANSPORT, rkt = 0x0, partition = 0, payload = 0x7f202f0339c0, len = 63, key = 0x0, key_len = 0, offset = 0, _private = 0x7f203577f8c0}

    # current
    (gdb) p $14-0
    $28 = (const cppkafka::Message *) 0x7f1ca8f58660
    (gdb) p $28.handle_.__ptr_
    $29 = {<std::__1::__compressed_pair_elem<rd_kafka_message_s*, 0, false>> = { __value_ = 0x7f184f129bf0}, ...}
    (gdb) p *(rd_kafka_message_s*)0x7f184f129bf0
    $30 = {err = RD_KAFKA_RESP_ERR_NO_ERROR, rkt = 0x7f1ed44fe000, partition = 1, payload = 0x7f1fc9bc6036, len = 242, key = 0x0, key_len = 0, offset = 2394853582209,

    # current+1
    (gdb) p (*($14+1)).handle_.__ptr_
    $44 = {<std::__1::__compressed_pair_elem<rd_kafka_message_s*, 0, false>> = { __value_ = 0x7f184f129d30}, ...}
    (gdb) p *(rd_kafka_message_s*)0x7f184f129d30
    $45 = {err = RD_KAFKA_RESP_ERR_NO_ERROR, rkt = 0x7f1ed44fe000, partition = 1, payload = 0x7f1fc9bc612f, len = 31, key = 0x0, key_len = 0, offset = 2394853582210,
      _private = 0x7f184f129cc0}

    # distance from the beginning
    (gdb) p messages.__end_-messages.__begin_
    $34 = 65536
    (gdb) p ($14-0)-messages.__begin_
    $37 = 8965
    (gdb) p ($14-1)-messages.__begin_
    $38 = 8964

    # parsing info
    (gdb) p allowed
    $39 = false
    (gdb) p new_rows
    $40 = 1
    (gdb) p total_rows
    $41 = 8964

    # current buffer is invalid
    (gdb) p *buffer.__ptr_
    $50 = {<DB::ReadBuffer> = {<DB::BufferBase> = {pos = 0x7f202f0339c0 "FindCoordinator response error: Local: Broker transport failure", bytes = 47904863385, working_buffer = {
            begin_pos = 0x7f202f0339c0 "FindCoordinator response error: Local: Broker transport failure",
            end_pos = 0x7f202f0339c0 "FindCoordinator response error: Local: Broker transport failure"}, internal_buffer = {

v0: check message errors in ReadBufferFromKafkaConsumer::nextImpl() (but
this may lead to using of that messages after and SIGSEGV again, doh).
v2: skip messages with an error after poll.
2020-07-10 11:41:44 +03:00
Alexey Milovidov
47eaffbe63 Additional checks 2020-07-10 11:21:40 +03:00
Alexey Milovidov
4b86f36d37 Check arguments of bloom filter index 2020-07-10 11:13:21 +03:00
alesapin
5cae87e664
Merge pull request #12335 from ClickHouse/fix_alter_exit_codes
Fix alter rename error messages
2020-07-10 11:05:20 +03:00
Alexey Milovidov
276b3a0215 Avoid exception when negative or floating point constant is used in WHERE condition for indexed tables #11905 2020-07-10 09:30:49 +03:00
Alexey Milovidov
a4b35a8a6f Allow to CLEAR column even if there are depending DEFAULT expressions #12333 2020-07-10 08:54:35 +03:00
alexey-milovidov
c16d8e094b
Merge pull request #12308 from ClickHouse/fix-codec-bad-exception-code
Fix wrong exception code in codecs Delta, DoubleDelta #12110
2020-07-10 08:40:46 +03:00