Commit Graph

454 Commits

Author SHA1 Message Date
Alexander Tokmakov
1ca9a92b21 Merge branch 'master' into write_structure_of_table_functions 2020-09-18 21:09:23 +03:00
Nikolai Kochetov
b26f11c00c Support StorageDistributed::read for QueryPlan. 2020-09-18 17:16:53 +03:00
Alexey Milovidov
7fb4dfea2c Small improvements for IStorage::rename 2020-09-17 22:50:43 +03:00
alexey-milovidov
84eece69ba
Merge pull request #14876 from amosbird/ns
Get rid of query settings after initialization.
2020-09-17 17:49:25 +03:00
Amos Bird
96a202c0fb
Get rid of query settings after initialization. 2020-09-16 22:35:39 +08:00
Pavel Kovalenko
01ab28a182 Don't throw exception if Distributed storage has multi-volume storage policy configuration. 2020-09-15 12:26:56 +03:00
Alexander Tokmakov
b840d741d0 Merge branch 'master' into write_structure_of_table_functions 2020-09-04 13:00:07 +03:00
alexey-milovidov
edea940e17
Update StorageDistributed.cpp 2020-09-03 04:39:36 +03:00
Azat Khuzhin
fffeeeba06 Force WithMergeableStateAfterAggregation via distributed_group_by_no_merge (convert to UInt64)
Possible values:
- 1 - Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.
- 2 - same as 1 but also apply ORDER BY and LIMIT stages
2020-09-03 00:52:51 +03:00
Azat Khuzhin
10b4f3b41f Optimize queries with LIMIT/LIMIT BY/ORDER BY for distributed with GROUP BY sharding_key
Previous set of QueryProcessingStage does not allow to do this.
But after WithMergeableStateAfterAggregation had been introduced the
following queries can be optimized too under
optimize_distributed_group_by_sharding_key:
- GROUP BY sharding_key LIMIT
- GROUP BY sharding_key LIMIT BY
- GROUP BY sharding_key ORDER BY

And right now it is still not supports:
- WITH TOTALS (looks like it can be supported)
- WITH ROLLUP (looks like it can be supported)
- WITH CUBE
- SETTINGS extremes=1 (looks like it can be supported)
But will be implemented separatelly.

vX: fixes
v2: fix WITH *
v3: fix extremes
v4: fix LIMIT OFFSET (and make a little bit cleaner)
v5: fix HAVING
v6: fix ORDER BY
v7: rebase against 20.7
v8: move out WithMergeableStateAfterAggregation
v9: add optimize_distributed_group_by_sharding_key into test names
2020-09-03 00:52:51 +03:00
Alexander Tokmakov
969940b4c9 write table tructure for table function remote(...) 2020-08-26 23:55:40 +03:00
Nikolai Kochetov
2cca4d5fcf Refactor Pipe [part 2]. 2020-08-03 16:54:14 +03:00
Vladimir Chebotarev
faedb04722 Minor fixes. 2020-07-28 19:45:46 +03:00
Vladimir Chebotarev
1b3f5c99f5 Real fix of test. 2020-07-26 21:27:36 +03:00
Vladimir Chebotarev
f5af64514f Test fix (removed redundant code). 2020-07-26 21:27:36 +03:00
Vladimir Chebotarev
8039d45910 Minor fix in StorageDistributed. 2020-07-26 21:27:36 +03:00
Gleb Novikov
7f5b6fba78 Generic volume is coming...
1. SingleDiskVolume for temporary volumes
2. Generic VolumePtr in StoragePolicies
3. Removed max_data_part_size in system.storage_policies, added volume_type
2020-07-26 21:27:36 +03:00
Artem Zuikov
2afd123eda
Refactoring: extract TreeOptimizer from SyntaxAnalyzer (#12645) 2020-07-22 20:13:05 +03:00
Azat Khuzhin
6ea1b19476 Remove data for Distributed tables (blocks from async INSERTs) on DROP TABLE 2020-07-17 08:59:57 +03:00
Azat Khuzhin
9c94993f14 Fix "Sharding key is not deterministic" message 2020-06-29 23:00:14 +03:00
alexey-milovidov
18eb141ea1
Merge pull request #11715 from azat/dist-optimize_skip_unused_shards-fixes
Control nesting level for shards skipping and disallow non-deterministic functions
2020-06-24 12:54:58 +03:00
alesapin
c9fa5d2ec3 Better naming 2020-06-19 18:39:41 +03:00
Azat Khuzhin
d34e6217bc Add logging of adjusting conditional settings for distributed queries 2020-06-18 21:49:29 +03:00
Azat Khuzhin
041533eae2 Disable optimize_skip_unused_shards if sharding_key has non-deterministic func
Example of such functions is rand()

And this patch disables only optimize_skip_unused_shards, i.e. INSERT
code path does not changed, so it will work as before.
2020-06-18 21:49:29 +03:00
alesapin
d79982f497 Better locks in Storages 2020-06-18 19:10:47 +03:00
alesapin
aab4ce6394 Truncate with metadata 2020-06-18 13:29:13 +03:00
alesapin
760e9a8488 Fix crash 2020-06-18 12:08:24 +03:00
alesapin
ebb36bec8a Merge branch 'master' into atomic_metadata5 2020-06-18 11:57:16 +03:00
alesapin
dffdece350 getColumns in StorageInMemoryMetadta (only compilable) 2020-06-17 19:39:58 +03:00
alesapin
ef8781cce7 Better getVirtuals method 2020-06-17 17:37:21 +03:00
Nikita Mikhaylov
ff0262626a
Merge pull request #11645 from azat/load-balancing-round-robin
Add round_robin load_balancing
2020-06-17 14:34:59 +04:00
alesapin
1ddeb3d149 Buildable getSampleBlock in StorageInMemoryMetadata 2020-06-16 18:51:29 +03:00
alesapin
53cb5210de Move getSampleBlockNonMaterialized to StorageInMemoryMetadata 2020-06-16 15:48:10 +03:00
alesapin
36ba0192df Metadata in read and write methods of IStorage 2020-06-15 22:08:58 +03:00
alesapin
af2fe2ba55 Compilable setColumns, setConstraints, setIndices 2020-06-15 19:55:33 +03:00
Azat Khuzhin
c139a05370 Forward declaration in StorageDistributed 2020-06-14 01:09:21 +03:00
alesapin
8be957ecb5 Better checks around metadata 2020-06-10 14:16:31 +03:00
alesapin
d2fcf5aea5 Fixes for gcc 2020-06-09 20:28:29 +03:00
alexey-milovidov
82e849e6a1
Update StorageDistributed.cpp 2020-06-06 18:57:52 +03:00
Azat Khuzhin
ff85125326 Fix readability-qualified-auto 2020-06-04 20:23:46 +03:00
Azat Khuzhin
86c5465bf8 Rewrite StorageSystemDistributionQueue interfaces 2020-06-04 03:04:32 +03:00
Azat Khuzhin
389f78ceee Add system.distribution_queue
system.distribution_queue contains the following columns:
- database
- table
- data_path
- is_blocked
- error_count
- data_files
- data_compressed_bytes
2020-06-04 02:36:16 +03:00
Azat Khuzhin
60d10f1bac Fix typo in StorageDistributed 2020-06-04 02:36:16 +03:00
alesapin
3847ea892d Merge branch 'master' into consistent_metadata3 2020-06-01 13:17:59 +03:00
Alexey Milovidov
25f941020b Remove namespace pollution 2020-05-31 00:57:37 +03:00
alesapin
52ca6b2051 I'm able to build it 2020-05-28 15:37:05 +03:00
Alexey Milovidov
7e1813825b Return old names of macros 2020-05-24 01:24:01 +03:00
Alexey Milovidov
85f84550ba Progress on task 2020-05-23 23:37:37 +03:00
Alexey Milovidov
9d2a0d2dd7 Apply all transformations again 2020-05-23 21:59:49 +03:00
Alexey Milovidov
a2ad11897f Remove duplicate whitespaces (preparation) 2020-05-23 21:53:58 +03:00
Alexey Milovidov
1f13515a65 Make all LOG in single line (preparation) 2020-05-23 21:31:37 +03:00
Alexey Milovidov
e391b77d81 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+"\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)"\);/\1_FORMATTED(\2, "\3{}\5", \4);/' 2020-05-23 19:56:05 +03:00
Alexey Milovidov
ee4ffbc332 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "\3{}", \4);/' 2020-05-23 19:47:56 +03:00
Alexey Milovidov
8d2e80a5e2 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+"\)' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+, "[^"]+")\)/\1_FORMATTED(\2)/' 2020-05-23 19:42:39 +03:00
Azat Khuzhin
bc4b75dead Add table name into logs for StorageDistributed 2020-05-23 11:57:14 +03:00
alesapin
59b3bc0c05
Merge branch 'master' into fix_deadlock_system_logs_startup 2020-05-21 22:57:52 +03:00
alesapin
c036af0261 Fix deadlock after clickhouse-server update (with changes in one of system log tables structure) during startup between concurrent merge and table rename. 2020-05-21 17:11:56 +03:00
Azat Khuzhin
d93b9a57f6 Forward declaration for Context as much as possible.
Now after changing Context.h 488 modules will be recompiled instead of 582.
2020-05-21 01:53:18 +03:00
Gleb Novikov
1a25ac6e1f Merge branch 'master' into refactor-reservations 2020-05-16 23:34:45 +03:00
alesapin
352c402cf2 Merge branch 'add_alter_rename_column_to_distributed' of https://github.com/vzakaznikov/ClickHouse into vzakaznikov-add_alter_rename_column_to_distributed 2020-05-13 14:14:23 +03:00
alexey-milovidov
33d491edf3
Merge pull request #10516 from azat/dist-GROUP_BY-sharding_key-fixes
Disable GROUP BY sharding_key optimization by default (and fix for WITH ROLLUP/CUBE/TOTALS)
2020-05-11 12:03:27 +03:00
Gleb Novikov
390b39b272 VolumePtr instead of DiskPtr in MergeTreeData* 2020-05-10 00:24:15 +03:00
Vitaliy Zakaznikov
90e52e7fea Adding support for ALTER RENAME COLUMN query to Distributed table engine. 2020-05-07 14:54:35 +02:00
Gleb Novikov
c637d99e07 Volumes and storages refactoring:
1. Moved Volume to separate file
  2. Created IVolume interface and implemented current behaviour in implementation of new interface — VolumeJBOD
  3. Replaced all old volume usages with new VolumeJBOD. Where it is unnecessary to have JBOD — left just IVolume.
  4. Removed old Volume completely
  5. Moved StoragePolicy to separated files
  6. Moved DiskSelector to separated files
  7. Removed DiskSpaceMonitor file
2020-05-04 23:15:38 +03:00
alexey-milovidov
229f666dea
Merge pull request #10611 from azat/optimize_skip_unused_shards-LowCardinality
Fix optimize_skip_unused_shards with LowCardinality
2020-05-02 22:14:33 +03:00
Azat Khuzhin
63d8ab8f03 Make createSelector() static (in storage) and const (in stream) 2020-05-01 11:31:05 +03:00
Azat Khuzhin
f22ba15b4a Reduce copy-paste of DistributedBlockOutputStream::createSelector
This will make it less error prone.
2020-05-01 02:59:40 +03:00
Azat Khuzhin
cdd7013438 Drop superfluous "Skipping irrelevant shards" messages
Before this patch it printed 3 times:
- from StorageDistributed::getProcessingStageImpl()
- from StorageDistributed::read()
- from StorageDistributed::getProcessingStageImpl() (from StorageDistributed::read() -> getSampleBlock())
(But this should be optimized)
2020-05-01 02:56:13 +03:00
Azat Khuzhin
c648c300bf Fix optimize_skip_unused_shards with LowCardinality 2020-05-01 02:39:58 +03:00
Azat Khuzhin
4cbe625567 Fix shard numbers output in logs (full cluster had been printed over optimized) 2020-05-01 02:13:07 +03:00
Azat Khuzhin
038235684d Add optimize_distributed_group_by_sharding_key and disable it by default
I know at least one way to fool that optimization, by using as sharding
key something like `if(col1>0, col1, col2)` (although this is not common
sharding key I would say, but can be useful if this will work
correctly), so let's disable it by default.
2020-04-29 00:09:25 +03:00
alesapin
f981649213 Fix pushing to views stream and refactor virtuals 2020-04-28 13:38:57 +03:00
alesapin
4badd0fd28 Better code 2020-04-27 20:46:51 +03:00
alesapin
01db4877f6 Fix style check 2020-04-27 18:44:33 +03:00
alesapin
18c550df15 Better virtuals logic 2020-04-27 16:55:30 +03:00
alesapin
2829774105 Merge branch 'master' into refactor_istorage 2020-04-27 15:34:21 +03:00
Azat Khuzhin
20b4eed9a1 Disable GROUP BY sharding_key optimization for WITH ROLLUP/CUBE/TOTALS 2020-04-27 01:30:54 +03:00
alexey-milovidov
c9334d3fde
Merge pull request #10491 from azat/dist-shutdown
Proper Distributed shutdown (fixes UAF, avoid waiting for sending all batches)
2020-04-25 23:47:59 +03:00
Azat Khuzhin
747a74215f Avoid processing all batches before Distributed shutdown 2020-04-25 02:03:27 +03:00
Azat Khuzhin
8ad6b37913 Proper StorageDistributed shutdown to avoid UAF in DistributedMonitor
StorageDistributed::shutdown() does not acquire the lock, that controls
access to the cluster_nodes_data, thus it does not synced with the
requireDirectoryMonitor(), hence some monitors can be untracked that
will trigger UAF (use-after-free) after DROP TABLE dist:

This is for the SIGSEGV from the DirectoryMonitor (with already destroyed storage):
    0  0x0000000008e9f760 in std::__1::__cxx_atomic_load<int> (__order=std::__1::memory_order::seq_cst, __a=0x0)
    1  std::__1::__atomic_base<int, false>::load (__m=std::__1::memory_order::seq_cst, this=0x0) <-- this is nullptr
    2  std::__1::__atomic_base<int, false>::operator int (this=0x0)
    3  DB::ActionBlocker::isCancelled (this=0x7f85e31c9bb8) at ../src/Common/ActionBlocker.h:18
    4  DB::StorageDistributedDirectoryMonitor::run (this=0x7f85f93b2a00) at ../src/Storages/Distributed/DirectoryMonitor.cpp:140
2020-04-25 02:03:26 +03:00
alesapin
793f4b734a Remove obsolete comment 2020-04-24 13:31:03 +03:00
alesapin
dc2dd77d2e Remove redundant overrides from IStorage 2020-04-24 12:20:09 +03:00
Alexander Tokmakov
04d6b59ac0 Merge branch 'master' into database_atomic 2020-04-23 17:31:37 +03:00
Alexey Milovidov
1e325a9fd9 Checkpoint 2020-04-22 09:22:14 +03:00
Alexander Tokmakov
b29bddac12 Merge branch 'master' into database_atomic 2020-04-20 14:09:09 +03:00
Alexey Milovidov
d7264b292d Merge branch 'master' into sorting-processors 2020-04-20 09:29:41 +03:00
alexey-milovidov
1577d771df
Merge pull request #10341 from azat/auto_distributed_group_by_no_merge
Auto distributed_group_by_no_merge on GROUP BY sharding key
2020-04-20 08:30:27 +03:00
Azat Khuzhin
e44d5c5749 Fix clang readability-container-size-empty warning in StorageDistributed::canForceGroupByNoMerge() 2020-04-20 01:12:22 +03:00
Azat Khuzhin
be1dec9239 Fix distributed_group_by_no_merge optimization for Distributed-over-Distributed 2020-04-19 21:11:14 +03:00
Azat Khuzhin
93d049fe64 Allow auto distributed_group_by_no_merge for DISTINCT of sharding key 2020-04-19 18:53:37 +03:00
Azat Khuzhin
de4a723264 Auto distributed_group_by_no_merge on GROUP BY injective function of sharding key 2020-04-19 18:33:49 +03:00
Azat Khuzhin
5d11118cc9 Use thread pool (background_distributed_schedule_pool_size) for distributed sends
After #8756 the problem with 1 thread for each (distributed table, disk)
for distributed sends became even worse (since there can be multiple
disks), so use predefined thread pool for this tasks, that can be
controlled with background_distributed_schedule_pool_size knob.
2020-04-19 12:01:56 +03:00
Nikolai Kochetov
153f795ebe Merge branch 'master' into sorting-processors 2020-04-15 12:07:05 +03:00
Alexander Tokmakov
5e6d4b9449 Merge branch 'master' into database_atomic 2020-04-12 16:35:44 +03:00
Nikolai Kochetov
71c72a75d7 Move Graphite params to separate file. 2020-04-10 12:24:16 +03:00
Alexander Kazakov
497df3086f Merge branch 'master' into timed_rwlock
Change-Id: I620bfde2121ff013773b001d514b40b1e796a58b
2020-04-10 11:38:20 +03:00
Alexander Kazakov
26dd6140b2 Added new config settings to control timeouts
* "lock_acquire_timeout" controls for how long a query will continue to
acquire each lock on its argument tables
 * "lock_acquire_timeout_for_background_operations" is a per-table
setting for storages of *MergeTree family
2020-04-09 21:10:27 +03:00
Alexander Tokmakov
dd1590830b Merge branch 'master' into database_atomic 2020-04-08 22:00:46 +03:00
alexey-milovidov
6d80ab1eed
Merge pull request #9811 from vitlibar/RBAC-8
RBAC-8
2020-04-08 05:47:55 +03:00
Vitaly Baranov
e573549945 Rework access rights for table functions. 2020-04-07 23:31:59 +03:00
Alexander Tokmakov
4c48b7dd80 better rename 2020-04-07 18:31:33 +03:00
Alexander Tokmakov
08bae4668d Merge branch 'master' into database_atomic 2020-04-06 16:18:07 +03:00
Azat Khuzhin
1232760f78 Fix Distributed-over-Distributed when nested table has only one shard 2020-04-04 13:47:35 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00