Merge branch 'master' into vdimir/join-step2

This commit is contained in:
vdimir 2024-12-17 14:29:21 +00:00
commit 29433ef1ad
No known key found for this signature in database
GPG Key ID: 6EE4CE2BEDC51862
2047 changed files with 44647 additions and 204328 deletions

View File

@ -17,34 +17,23 @@ Checks: [
'-android-*',
'-boost-use-ranges',
'-modernize-use-ranges',
'-bugprone-assignment-in-if-condition',
'-bugprone-branch-clone',
'-bugprone-easily-swappable-parameters',
'-bugprone-exception-escape',
'-bugprone-forward-declaration-namespace',
'-bugprone-implicit-widening-of-multiplication-result',
'-bugprone-multi-level-implicit-pointer-conversion',
'-bugprone-narrowing-conversions',
'-bugprone-not-null-terminated-result',
'-bugprone-unchecked-optional-access',
'-bugprone-crtp-constructor-accessibility',
'-bugprone-not-null-terminated-result',
'-cert-dcl16-c',
'-cert-err58-cpp',
'-cert-msc32-c',
'-cert-msc51-cpp',
'-cert-oop54-cpp',
'-cert-oop57-cpp',
'-cert-err33-c', # Misreports on clang-19: it warns about all functions containing 'remove' in the name, not only about the standard library.
'-clang-analyzer-optin.performance.Padding',
'-clang-analyzer-cplusplus.PlacementNew',
'-clang-analyzer-unix.Malloc',
'-cppcoreguidelines-*', # impractical in a codebase as large as ClickHouse, also slow
'-darwin-*',
@ -77,39 +66,33 @@ Checks: [
'-hicpp-use-emplace',
'-hicpp-vararg',
'-linuxkernel-*',
'-llvm-*',
'-llvmlibc-*',
'-openmp-*',
'-misc-const-correctness',
'-misc-include-cleaner', # useful but far too many occurrences
'-misc-no-recursion',
'-misc-non-private-member-variables-in-classes',
'-misc-confusable-identifiers', # useful but slooo
'-misc-use-anonymous-namespace',
'-misc-use-internal-linkage',
'-modernize-avoid-c-arrays',
'-modernize-concat-nested-namespaces',
'-modernize-macro-to-enum',
'-modernize-pass-by-value',
'-modernize-return-braced-init-list',
'-modernize-use-auto',
'-modernize-use-constraints', # This is a good check, but clang-tidy crashes, see https://github.com/llvm/llvm-project/issues/91872
'-modernize-use-default-member-init',
'-modernize-use-emplace',
'-modernize-use-nodiscard',
'-modernize-use-ranges',
'-modernize-use-trailing-return-type',
'-modernize-use-designated-initializers',
'-performance-avoid-endl',
'-performance-enum-size',
'-performance-inefficient-string-concatenation',
'-performance-no-int-to-ptr',
'-performance-avoid-endl',
'-performance-unnecessary-value-param',
'-portability-simd-intrinsics',
@ -124,7 +107,6 @@ Checks: [
'-readability-identifier-length',
'-readability-identifier-naming', # useful but too slow
'-readability-implicit-bool-conversion',
'-readability-isolate-declaration',
'-readability-magic-numbers',
'-readability-named-parameter',
'-readability-redundant-declaration',

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v24.12, 2024-12-19](#2412)**<br/>
**[ClickHouse release v24.11, 2024-11-26](#2411)**<br/>
**[ClickHouse release v24.10, 2024-10-31](#2410)**<br/>
**[ClickHouse release v24.9, 2024-09-26](#249)**<br/>
@ -14,6 +15,124 @@
# 2024 Changelog
### <a id="2412"></a> ClickHouse release 24.12, 2024-12-19
#### Backward Incompatible Change
* Functions `greatest` and `least` now ignore NULL input values, whereas they previously returned NULL if one of the arguments was NULL. For example, `SELECT greatest(1, 2, NULL)` now returns 2. This makes the behavior compatible with PostgreSQL, but at the same time it breaks the compatibility with MySQL which returns NULL. To retain the previous behavior, set setting `least_greatest_legacy_null_behavior` (default: `false`) to `true`. [#65519](https://github.com/ClickHouse/ClickHouse/pull/65519) ([kevinyhzou](https://github.com/KevinyhZou)).
* The legacy MongoDB integration based on the Poco driver has been removed. Server setting `use_legacy_mongodb_integration` is obsolete and has no effect anymore. [#71997](https://github.com/ClickHouse/ClickHouse/pull/71997) ([Kirill Nikiforov](https://github.com/allmazz)). The new integration is strictly more capable and powerful.
#### New Feature
* Move `JSON`/`Dynamic`/`Variant` types from experimental features to beta. [#72294](https://github.com/ClickHouse/ClickHouse/pull/72294) ([Pavel Kruglov](https://github.com/Avogar)). We also backported all fixes as well as this change to 24.11.
* Schema evolution for the [Iceberg data storage](https://iceberg.apache.org/spec/#file-system-operations) format provides the user with extensive options for modifying the schema of their table. The order of columns, column names, and simple type extensions can be changed under the hood. [#69445](https://github.com/ClickHouse/ClickHouse/pull/69445) ([Daniil Ivanik](https://github.com/divanik)).
* Integrate with Iceberg REST Catalog: a new database engine, named Iceberg, which plugs the whole catalog into ClickHouse. [#71542](https://github.com/ClickHouse/ClickHouse/pull/71542) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Added cache for primary index of `MergeTree` tables (can be enabled by table setting `use_primary_key_cache`). If lazy load and cache are enabled for primary index, it will be loaded to cache on demand (similar to mark cache) instead of keeping it in memory forever. Added prewarm of primary index on inserts/mergs/fetches of data parts and on restarts of table (can be enabled by setting `prewarm_primary_key_cache`). This allows lower memory usage for huge tables on shared storage, and we tested it on tables over one quadrillion records. [#72102](https://github.com/ClickHouse/ClickHouse/pull/72102) ([Anton Popov](https://github.com/CurtizJ)). [#72750](https://github.com/ClickHouse/ClickHouse/pull/72750) ([Alexander Gololobov](https://github.com/davenger)).
* Implement `SYSTEM LOAD PRIMARY KEY` command to load primary indexes for all parts of a specified table or for all tables if no table is specified. This will be useful for benchmarks and to prevent extra latency during query execution. [#66252](https://github.com/ClickHouse/ClickHouse/pull/66252) [#67733](https://github.com/ClickHouse/ClickHouse/pull/67733) ([ZAWA_ll](https://github.com/Zawa-ll)).
* Added a query that allows to attach `MergeTree` tables as `ReplicatedMergeTree` and vice versa: `ATTACH TABLE ... AS REPLICATED` and `ATTACH TABLE ... AS NOT REPLICATED`. [#65401](https://github.com/ClickHouse/ClickHouse/pull/65401) ([Kirill](https://github.com/kirillgarbar)).
* A new setting, `http_response_headers` which allows you to customize the HTTP response headers. For example, you can tell the browser to render a picture that is stored in the database. This closes [#59620](https://github.com/ClickHouse/ClickHouse/issues/59620). [#72656](https://github.com/ClickHouse/ClickHouse/pull/72656) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add function `toUnixTimestamp64Second` which converts a `DateTime64` to a `Int64` value with fixed second precision, so we can support return negative value if date is before the unix epoch. [#70597](https://github.com/ClickHouse/ClickHouse/pull/70597) ([zhanglistar](https://github.com/zhanglistar)). [#73146](https://github.com/ClickHouse/ClickHouse/pull/73146) ([Robert Schulze](https://github.com/rschu1ze)).
* Add new setting `enforce_index_structure_match_on_partition_manipulation` to allow attach when the set of source table's projections and secondary indices is a subset of those in the target table. Close [#70602](https://github.com/ClickHouse/ClickHouse/issues/70602). [#70603](https://github.com/ClickHouse/ClickHouse/pull/70603) ([zwy991114](https://github.com/zwy991114)).
* Add a setting `composed_data_type_output_format_mode`. If it is set to `spark`, arrays, tuples and maps will be output in a weird way, compatible with Apache Spark, that we don't care about at all. [#70957](https://github.com/ClickHouse/ClickHouse/pull/70957) ([zhanglistar](https://github.com/zhanglistar)).
* Add syntax ALTER USER {ADD|MODIFY|DROP SETTING}, ALTER USER {ADD|DROP PROFILE}, the same for ALTER ROLE and ALTER PROFILE. So instead of replacing all the set of settings, you can modify it. [#72050](https://github.com/ClickHouse/ClickHouse/pull/72050) ([pufit](https://github.com/pufit)).
* Added `arrayPRAUC` function, which calculates the AUC (Area Under the Curve) for the Precision Recall curve. [#72073](https://github.com/ClickHouse/ClickHouse/pull/72073) ([Emmanuel](https://github.com/emmanuelsdias)).
* Add `indexOfAssumeSorted` function for array types. Optimizes the search in the case of a sorted in non-decreasing order array. The effect appears on very large arrays (over 100,000 elements). [#72517](https://github.com/ClickHouse/ClickHouse/pull/72517) ([Eric Kurbanov](https://github.com/erickurbanov)).
* Allows to use a delimiter as a optional second argument for aggregate function `groupConcat`. [#72540](https://github.com/ClickHouse/ClickHouse/pull/72540) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Function `translate` now supports character deletion if the `from` argument contains more characters than the `to` argument. Example: `SELECT translate('clickhouse', 'clickhouse', 'CLICK')` now returns `CLICK`. [#71441](https://github.com/ClickHouse/ClickHouse/pull/71441) ([shuai.xu](https://github.com/shuai-xu)).
#### Experimental Features
* A new MergeTree setting `allow_experimental_reverse_key` that enables support for descending sort order in MergeTree sorting keys. This is useful for time series analysis, especially TopN queries. Example usage: `ENGINE = MergeTree ORDER BY (time DESC, key)`- descending order for the `time` field. [#71095](https://github.com/ClickHouse/ClickHouse/pull/71095) ([Amos Bird](https://github.com/amosbird)).
#### Performance Improvement
* JOIN reordering. Added an option to select the side of the join that will act as the inner (build) table in the query plan. This is controlled by `query_plan_join_swap_table`, which can be set to `auto`. In this mode, ClickHouse will try to choose the table with the smallest number of rows. [#71577](https://github.com/ClickHouse/ClickHouse/pull/71577) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Now `parallel_hash` algorithm will be used (if applicable) when the `join_algorithm` setting is set to `default`. Two previous alternatives (`direct` and `hash`) are still considered when `parallel_hash` cannot be used. [#70788](https://github.com/ClickHouse/ClickHouse/pull/70788) ([Nikita Taranov](https://github.com/nickitat)).
* Add option to extract common expressions from `WHERE` and `ON` expressions in order to reduce the number of hash tables used during joins. This makes sense when the JOIN ON condition has common parts inside AND in different OR parts. Can be enabled by `optimize_extract_common_expressions = 1`. [#71537](https://github.com/ClickHouse/ClickHouse/pull/71537) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Allows to use indexes on `SELECT` when an indexed column is CAST into a `LowCardinality(String)`, which could be the case when a query run over a Merge table with some tables having `String` and some `LowCardinality(String)`. [#71598](https://github.com/ClickHouse/ClickHouse/pull/71598) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* During query execution with parallel replicas and enabled local plan, do not perform index analysis on workers. The coordinator will choose ranges to read for workers based on index analysis on its side (on the query initiator). This makes short queries with parallel replicas have as low latency as single-node queries. [#72109](https://github.com/ClickHouse/ClickHouse/pull/72109) ([Igor Nikonov](https://github.com/devcrafter)).
* Memory usage of `clickhouse disks remove --recursive` is reduced for object storage disks. [#67323](https://github.com/ClickHouse/ClickHouse/pull/67323) ([Kirill](https://github.com/kirillgarbar)).
* Bring back optimization for reading subcolumns of single column in compact parts from [#57631](https://github.com/ClickHouse/ClickHouse/pull/57631). It was deleted accidentally. [#72285](https://github.com/ClickHouse/ClickHouse/pull/72285) ([Pavel Kruglov](https://github.com/Avogar)).
* Speedup sorting of `LowCardinality(String)` columns by de-virtualizing calls in comparator. [#72337](https://github.com/ClickHouse/ClickHouse/pull/72337) ([Alexander Gololobov](https://github.com/davenger)).
* Optimize function `argMin`/`argMax` for some simple data types. [#72350](https://github.com/ClickHouse/ClickHouse/pull/72350) ([alesapin](https://github.com/alesapin)).
* Optimize locking with shared locks in the memory tracker to reduce lock contention, which improves performance on systems with a very high number of CPU. [#72375](https://github.com/ClickHouse/ClickHouse/pull/72375) ([Jiebin Sun](https://github.com/jiebinn)).
* Add a new setting, `use_async_executor_for_materialized_views`. Use async and potentially multithreaded execution of materialized view query, can speedup views processing during INSERT, but also consumes more memory. [#72497](https://github.com/ClickHouse/ClickHouse/pull/72497) ([alesapin](https://github.com/alesapin)).
* Improved performance of deserialization of states of aggregate functions (in data type `AggregateFunction` and in distributed queries). Slightly improved performance of parsing of format `RowBinary`. [#72818](https://github.com/ClickHouse/ClickHouse/pull/72818) ([Anton Popov](https://github.com/CurtizJ)).
* Split ranges in reading with parallel replicas in the order of the table's key to consume less memory during reading. [#72173](https://github.com/ClickHouse/ClickHouse/pull/72173) ([JIaQi](https://github.com/JiaQiTang98)).
* Speed up insertions into merge tree in the case of a single value of partition key inside the inserted batch. [#72348](https://github.com/ClickHouse/ClickHouse/pull/72348) ([alesapin](https://github.com/alesapin)).
* Implement creating tables in parallel while restoring from a backup. Before this PR the `RESTORE` command always created tables in one thread, which could be slow in case of backups containing many tables. [#72427](https://github.com/ClickHouse/ClickHouse/pull/72427) ([Vitaly Baranov](https://github.com/vitlibar)).
* Dropping mark cache might take noticeable time if it is big. If we hold context mutex during this it block many other activities, even new client connection cannot be established until it is released. And holding this mutex is not actually required for synchronization, it is enough to have a local reference to the cache via shared ptr. [#72749](https://github.com/ClickHouse/ClickHouse/pull/72749) ([Alexander Gololobov](https://github.com/davenger)).
#### Improvement
* Remove the `allow_experimental_join_condition` setting, allowing non-equi conditions by default. [#69910](https://github.com/ClickHouse/ClickHouse/pull/69910) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Settings from server config (users.xml) now apply on the client too. Useful for format settings, e.g. `date_time_output_format`. [#71178](https://github.com/ClickHouse/ClickHouse/pull/71178) ([Michael Kolupaev](https://github.com/al13n321)).
* Automatic `GROUP BY`/`ORDER BY` to disk based on the server/user memory usage. Controlled with `max_bytes_ratio_before_external_group_by`/`max_bytes_ratio_before_external_sort` query settings. [#71406](https://github.com/ClickHouse/ClickHouse/pull/71406) ([Azat Khuzhin](https://github.com/azat)).
* Adding a new cancellation logic using a singleton object `CancellationChecker` which checks the timeouts for queries. [#69880](https://github.com/ClickHouse/ClickHouse/pull/69880) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Support ALTER from `Object` to `JSON`, which means you can easily migrate from the deprecated Object type. [#71784](https://github.com/ClickHouse/ClickHouse/pull/71784) ([Pavel Kruglov](https://github.com/Avogar)).
* Allow unknown values in set that are not present in Enum. Fix [#72662](https://github.com/ClickHouse/ClickHouse/issues/72662). [#72686](https://github.com/ClickHouse/ClickHouse/pull/72686) ([zhanglistar](https://github.com/zhanglistar)).
* Support string search operator (eg., LIKE) for `Enum` data type, implements [#72661](https://github.com/ClickHouse/ClickHouse/issues/72661). [#72732](https://github.com/ClickHouse/ClickHouse/pull/72732) ([zhanglistar](https://github.com/zhanglistar)).
* Some meaningless ALTER USER queries were accepted. Fixes [#71227](https://github.com/ClickHouse/ClickHouse/issues/71227). [#71286](https://github.com/ClickHouse/ClickHouse/pull/71286) ([Arthur Passos](https://github.com/arthurpassos)).
* Respect `prefer_locahost_replica` when building plan for distributed `INSERT ... SELECT`. [#72190](https://github.com/ClickHouse/ClickHouse/pull/72190) ([filimonov](https://github.com/filimonov)).
* Azure violated the Iceberg specification, mistakenly labeling Iceberg v1 as Iceberg v2. The problem is [described here](https://github.com/ClickHouse/ClickHouse/issues/72091). Azure Iceberg Writer creates Iceberg metadata files (as well as manifest files) that violate specs. Now we attempt to read v1 Iceberg format metadata with the v2 reader (cause they write it in a this way), and added error when they didn't create corresponding fields in a manifest file. [#72277](https://github.com/ClickHouse/ClickHouse/pull/72277) ([Daniil Ivanik](https://github.com/divanik)).
* Now it's allowed to `CREATE MATERIALIZED VIEW` with `UNION [ALL]` in query. Behavior is the same as for matview with `JOIN`: only the first table in `SELECT` expression will work as a trigger for insert, all other tables will be ignored. However, if there are many references to the first table (e.g., UNION with itself), all of them will be processed as the inserted block of data. [#72347](https://github.com/ClickHouse/ClickHouse/pull/72347) ([alesapin](https://github.com/alesapin)).
memory usage issue. [#72490](https://github.com/ClickHouse/ClickHouse/pull/72490) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Added source query validation when ClickHouse is used as a source for a dictionary. [#72548](https://github.com/ClickHouse/ClickHouse/pull/72548) ([Alexey Katsman](https://github.com/alexkats)).
* Ensure that ClickHouse will see ZooKeeper changes on config reloads. [#72593](https://github.com/ClickHouse/ClickHouse/pull/72593) ([Azat Khuzhin](https://github.com/azat)).
* Better memory usage approximation of cached marks to reduce total memory usage of the cache. [#72630](https://github.com/ClickHouse/ClickHouse/pull/72630) ([Antonio Andelic](https://github.com/antonio2368)).
* Add a new `StartupScriptsExecutionState` metric. The metric can have three values: 0 = startup scripts have not finished yet, 1 = startup scripts executed successfully, 2 = startup scripts failed. We need this metric because we need to know if startup scripts are being executed successfully in the cloud, especially after releases to base configurations. [#72637](https://github.com/ClickHouse/ClickHouse/pull/72637) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Add the new `MergeTreeIndexGranularityInternalArraysTotalSize` metric to `system.metrics`. This metric is needed to find the instances with huge datasets susceptible to the high
* Add retries to creating a replicated table. [#72682](https://github.com/ClickHouse/ClickHouse/pull/72682) ([Vitaly Baranov](https://github.com/vitlibar)).
* Add `total_bytes_with_inactive` to `system.tables` to count the total bytes of inactive parts. [#72690](https://github.com/ClickHouse/ClickHouse/pull/72690) ([Kai Zhu](https://github.com/nauu)).
* Add MergeTree settings to `system.settings_changes`. [#72694](https://github.com/ClickHouse/ClickHouse/pull/72694) ([Raúl Marín](https://github.com/Algunenano)).
* Support JSON type in the `notEmpty` function. [#72741](https://github.com/ClickHouse/ClickHouse/pull/72741) ([Pavel Kruglov](https://github.com/Avogar)).
* Support parsing GCS S3 error `AuthenticationRequired`. [#72753](https://github.com/ClickHouse/ClickHouse/pull/72753) ([Vitaly Baranov](https://github.com/vitlibar)).
* Support `Dynamic` type in functions `ifNull` and `coalesce`. [#72772](https://github.com/ClickHouse/ClickHouse/pull/72772) ([Pavel Kruglov](https://github.com/Avogar)).
* Support `Dynamic` in functions `toFloat64`/`touInt32`/etc. [#72989](https://github.com/ClickHouse/ClickHouse/pull/72989) ([Pavel Kruglov](https://github.com/Avogar)).
* Add S3 request settings `http_max_fields`, `http_max_field_name_size`, `http_max_field_value_size` and use them while parsing S3 API responses during making a backup or restoring. [#72778](https://github.com/ClickHouse/ClickHouse/pull/72778) ([Vitaly Baranov](https://github.com/vitlibar)).
* Delete table metadata in keeper in Storage S3(Azure)Queue only after last table using this metadata was dropped. [#72810](https://github.com/ClickHouse/ClickHouse/pull/72810) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Added `JoinBuildTableRowCount`/`JoinProbeTableRowCount/JoinResultRowCount` profile events. [#72842](https://github.com/ClickHouse/ClickHouse/pull/72842) ([Vladimir Cherkasov](https://github.com/vdimir)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix possible intersecting parts for MergeTree (after an operation of moving part to the detached directory has been failed, possibly due to operation on object storage). [#70476](https://github.com/ClickHouse/ClickHouse/pull/70476) ([Azat Khuzhin](https://github.com/azat)).
* Fixes an error detection when a table name is too long. Provide a diagnostic telling the maximum length. Add a new function `getMaxTableNameLengthForDatabase`. [#70810](https://github.com/ClickHouse/ClickHouse/pull/70810) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fixed zombie processes after after a crash of `clickhouse-library-bridge` (this program allows to run unsafe libraries). [#71301](https://github.com/ClickHouse/ClickHouse/pull/71301) ([MikhailBurdukov](https://github.com/MikhailBurdukov)).
* Fix NoSuchKey error during transaction rollback when creating a directory fails for the `plain_rewritable` disk. [#71439](https://github.com/ClickHouse/ClickHouse/pull/71439) ([Julia Kartseva](https://github.com/jkartseva)).
* Fix serialization of `Dynamic` values in `Pretty` JSON formats. [#71923](https://github.com/ClickHouse/ClickHouse/pull/71923) ([Pavel Kruglov](https://github.com/Avogar)).
* Add inferred format name to create query in `File`/`S3`/`URL`/`HDFS`/`Azure` engines. Previously the format name was inferred each time the server was restarted, and if the specified data files were removed, it led to errors during server startup. [#72108](https://github.com/ClickHouse/ClickHouse/pull/72108) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix bugs when using a UDF in join on expression with the old analyzer. [#72179](https://github.com/ClickHouse/ClickHouse/pull/72179) ([Raúl Marín](https://github.com/Algunenano)).
* Fixes some small bugs in `StorageObjectStorage`. Needs to enable `use_hive_partitioning` by default. [#72185](https://github.com/ClickHouse/ClickHouse/pull/72185) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fix a bug where `min_age_to_force_merge_on_partition_only` was getting stuck trying to merge down the same partition repeatedly that was already merged to a single part and not merging partitions that had multiple parts. [#72209](https://github.com/ClickHouse/ClickHouse/pull/72209) ([Christoph Wurm](https://github.com/cwurm)).
* Fixed a crash in `SimpleSquashingChunksTransform` that occurred in rare cases when processing sparse columns. [#72226](https://github.com/ClickHouse/ClickHouse/pull/72226) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fixed data race in `GraceHashJoin` as the result of which some rows might be missing in the join output. [#72233](https://github.com/ClickHouse/ClickHouse/pull/72233) ([Nikita Taranov](https://github.com/nickitat)).
* Fixed `ALTER DELETE` queries with materialized `_block_number` column (if setting `enable_block_number_column` is enabled). [#72261](https://github.com/ClickHouse/ClickHouse/pull/72261) ([Anton Popov](https://github.com/CurtizJ)).
* Fixed data race when `ColumnDynamic::dumpStructure()` is called concurrently e.g., in `ConcurrentHashJoin` constructor. [#72278](https://github.com/ClickHouse/ClickHouse/pull/72278) ([Nikita Taranov](https://github.com/nickitat)).
* Fix possible `LOGICAL_ERROR` with duplicate columns in `ORDER BY ... WITH FILL`. [#72387](https://github.com/ClickHouse/ClickHouse/pull/72387) ([Vladimir Cherkasov](https://github.com/vdimir)).
* Fixed mismatched types in several cases after applying `optimize_functions_to_subcolumns`. [#72394](https://github.com/ClickHouse/ClickHouse/pull/72394) ([Anton Popov](https://github.com/CurtizJ)).
* Use `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE` instead of `AWS_CONTAINER_AUTHORIZATION_TOKEN_PATH`. Fixes [#71074](https://github.com/ClickHouse/ClickHouse/issues/71074). [#72397](https://github.com/ClickHouse/ClickHouse/pull/72397) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix failure on parsing `BACKUP DATABASE db EXCEPT TABLES db.table` queries. [#72429](https://github.com/ClickHouse/ClickHouse/pull/72429) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Don't allow creating empty `Variant`. [#72454](https://github.com/ClickHouse/ClickHouse/pull/72454) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix invalid formatting of `result_part_path` in `system.merges`. [#72567](https://github.com/ClickHouse/ClickHouse/pull/72567) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix parsing a glob with one element (such as `{file}`). [#72572](https://github.com/ClickHouse/ClickHouse/pull/72572) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix query generation for the follower server in case of a distributed query with `ARRAY JOIN`. Fixes [#69276](https://github.com/ClickHouse/ClickHouse/issues/69276). [#72608](https://github.com/ClickHouse/ClickHouse/pull/72608) ([Dmitry Novik](https://github.com/novikd)).
* Fix a bug when DateTime64 IN DateTime64 returns nothing. [#72640](https://github.com/ClickHouse/ClickHouse/pull/72640) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Fixed inconsistent metadata when adding a new replica to a Replicated database that has a table created with `flatten_nested=0`. [#72685](https://github.com/ClickHouse/ClickHouse/pull/72685) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix advanced SSL configuration for Keeper's internal communication. [#72730](https://github.com/ClickHouse/ClickHouse/pull/72730) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix "No such key" error in S3Queue unordered mode with `tracked_files_limit` setting smaller than s3 files appearance rate. [#72738](https://github.com/ClickHouse/ClickHouse/pull/72738) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix exception thrown in RemoteQueryExecutor when a user does not exist locally. [#72759](https://github.com/ClickHouse/ClickHouse/pull/72759) ([Andrey Zvonov](https://github.com/zvonand)).
* Fixed mutations with materialized `_block_number` column (if setting `enable_block_number_column` is enabled). [#72854](https://github.com/ClickHouse/ClickHouse/pull/72854) ([Anton Popov](https://github.com/CurtizJ)).
* Fix backup/restore with plain rewritable disk in case there are empty files in backup. [#72858](https://github.com/ClickHouse/ClickHouse/pull/72858) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Properly cancel inserts in DistributedAsyncInsertDirectoryQueue. [#72885](https://github.com/ClickHouse/ClickHouse/pull/72885) ([Antonio Andelic](https://github.com/antonio2368)).
* Fixed crash while parsing of incorrect data into sparse columns (can happen with enabled setting `enable_parsing_to_custom_serialization`). [#72891](https://github.com/ClickHouse/ClickHouse/pull/72891) ([Anton Popov](https://github.com/CurtizJ)).
* Fix potential crash during backup restore. [#72947](https://github.com/ClickHouse/ClickHouse/pull/72947) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fixed bug in `parallel_hash` JOIN method that might appear when query has complex condition in the `ON` clause with inequality filters. [#72993](https://github.com/ClickHouse/ClickHouse/pull/72993) ([Nikita Taranov](https://github.com/nickitat)).
* Use default format settings during JSON parsing to avoid broken deserialization. [#73043](https://github.com/ClickHouse/ClickHouse/pull/73043) ([Pavel Kruglov](https://github.com/Avogar)).
* Fix crash in transactions with unsupported storage. [#73045](https://github.com/ClickHouse/ClickHouse/pull/73045) ([Raúl Marín](https://github.com/Algunenano)).
* Fix possible overestimation of memory tracking (when the difference between `MemoryTracking` and `MemoryResident` kept growing). [#73081](https://github.com/ClickHouse/ClickHouse/pull/73081) ([Azat Khuzhin](https://github.com/azat)).
* Check for duplicate JSON keys during Tuple parsing. Previously it could lead to a logical error `Invalid number of rows in Chunk` during parsing. [#73082](https://github.com/ClickHouse/ClickHouse/pull/73082) ([Pavel Kruglov](https://github.com/Avogar)).
#### Build/Testing/Packaging Improvement
* All small utilities previously stored in `/utils` folder and required manual compilation from sources are now a part of main ClickHouse bundle. This closes: [#72404](https://github.com/ClickHouse/ClickHouse/issues/72404). [#72426](https://github.com/ClickHouse/ClickHouse/pull/72426) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Get rid of `/etc/systemd/system/clickhouse-server.service` removal introduced in 22.3 [#39323](https://github.com/ClickHouse/ClickHouse/issues/39323). [#72259](https://github.com/ClickHouse/ClickHouse/pull/72259) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Split large translation units to avoid compilation failures due to memory/cpu limitations. [#72352](https://github.com/ClickHouse/ClickHouse/pull/72352) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* OSX: Build with ICU support, which enables collations, charset conversions and other localization features. [#73083](https://github.com/ClickHouse/ClickHouse/pull/73083) ([Raúl Marín](https://github.com/Algunenano)).
### <a id="2411"></a> ClickHouse release 24.11, 2024-11-26
#### Backward Incompatible Change
@ -226,7 +345,7 @@
* `CREATE TABLE AS` will copy `PRIMARY KEY`, `ORDER BY`, and similar clauses (of `MergeTree` tables). [#69739](https://github.com/ClickHouse/ClickHouse/pull/69739) ([sakulali](https://github.com/sakulali)).
* Support 64-bit XID in Keeper. It can be enabled with the `use_xid_64` configuration value. [#69908](https://github.com/ClickHouse/ClickHouse/pull/69908) ([Antonio Andelic](https://github.com/antonio2368)).
* Command-line arguments for Bool settings are set to true when no value is provided for the argument (e.g. `clickhouse-client --optimize_aggregation_in_order --query "SELECT 1"`). [#70459](https://github.com/ClickHouse/ClickHouse/pull/70459) ([davidtsuk](https://github.com/davidtsuk)).
* Added user-level settings `min_free_disk_bytes_to_throw_insert` and `min_free_disk_ratio_to_throw_insert` to prevent insertions on disks that are almost full. [#69755](https://github.com/ClickHouse/ClickHouse/pull/69755) ([Marco Vilas Boas](https://github.com/marco-vb)).
* Added user-level settings `min_free_disk_bytes_to_perform_insert` and `min_free_disk_perform_to_throw_insert` to prevent insertions on disks that are almost full. [#69755](https://github.com/ClickHouse/ClickHouse/pull/69755) ([Marco Vilas Boas](https://github.com/marco-vb)).
* Embedded documentation for settings will be strictly more detailed and complete than the documentation on the website. This is the first step before making the website documentation always auto-generated from the source code. This has long-standing implications: - it will be guaranteed to have every setting; - there is no chance of having default values obsolete; - we can generate this documentation for each ClickHouse version; - the documentation can be displayed by the server itself even without Internet access. Generate the docs on the website from the source code. [#70289](https://github.com/ClickHouse/ClickHouse/pull/70289) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Allow empty needle in the function `replace`, the same behavior with PostgreSQL. [#69918](https://github.com/ClickHouse/ClickHouse/pull/69918) ([zhanglistar](https://github.com/zhanglistar)).
* Allow empty needle in functions `replaceRegexp*`. [#70053](https://github.com/ClickHouse/ClickHouse/pull/70053) ([zhanglistar](https://github.com/zhanglistar)).

View File

@ -43,14 +43,14 @@ Keep an eye out for upcoming meetups and events around the world. Somewhere else
Upcoming meetups
* [Stockholm Meetup](https://www.meetup.com/clickhouse-stockholm-user-group/events/304382411) - December 9
* [New York Meetup](https://www.meetup.com/clickhouse-new-york-user-group/events/304268174) - December 9
* [Kuala Lampur Meetup](https://www.meetup.com/clickhouse-malaysia-meetup-group/events/304576472/) - December 11
* [San Francisco Meetup](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/304286951/) - December 12
* [Dubai Meetup](https://www.meetup.com/clickhouse-dubai-meetup-group/events/303096989/) - Feb 3
Recently completed meetups
* [San Francisco Meetup](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/304286951/) - December 12
* [Kuala Lampur Meetup](https://www.meetup.com/clickhouse-malaysia-meetup-group/events/304576472/) - December 11
* [Stockholm Meetup](https://www.meetup.com/clickhouse-stockholm-user-group/events/304382411) - December 9
* [New York Meetup](https://www.meetup.com/clickhouse-new-york-user-group/events/304268174) - December 9
* [Amsterdam Meetup](https://www.meetup.com/clickhouse-netherlands-user-group/events/303638814) - December 3
* [Paris Meetup](https://www.meetup.com/clickhouse-france-user-group/events/303096434) - November 26
* [Ghent Meetup](https://www.meetup.com/clickhouse-belgium-user-group/events/303049405/) - November 19

View File

@ -7,7 +7,7 @@
#include <base/find_symbols.h>
#include <base/preciseExp10.h>
#define JSON_MAX_DEPTH 100
constexpr size_t JSON_MAX_DEPTH = 100;
#pragma clang diagnostic push

View File

@ -421,13 +421,13 @@ ALWAYS_INLINE inline char * writeSIntText(T x, char * pos)
if constexpr (std::is_same_v<T, Int128>)
{
const char * res = "-170141183460469231731687303715884105728";
memcpy(pos, res, strlen(res));
memcpy(pos, res, strlen(res)); /// NOLINT(bugprone-not-null-terminated-result)
return pos + strlen(res);
}
else if constexpr (std::is_same_v<T, Int256>)
{
const char * res = "-57896044618658097711785492504343953926634992332820282019728792003956564819968";
memcpy(pos, res, strlen(res));
memcpy(pos, res, strlen(res)); /// NOLINT(bugprone-not-null-terminated-result)
return pos + strlen(res);
}
}

View File

@ -76,7 +76,8 @@ double preciseExp10(double x)
1e+289, 1e+290, 1e+291, 1e+292, 1e+293, 1e+294, 1e+295, 1e+296, 1e+297, 1e+298, 1e+299, 1e+300, 1e+301, 1e+302, 1e+303, 1e+304, 1e+305,
1e+306, 1e+307, 1e+308};
double n, y = modf(x, &n);
double n;
double y = modf(x, &n);
if (n > 308) return INFINITY;
if (n < -323) return 0;

View File

@ -78,132 +78,131 @@ int Socket::select(SocketList& readList, SocketList& writeList, SocketList& exce
int epollfd = -1;
{
struct epoll_event eventsIn[epollSize];
memset(eventsIn, 0, sizeof(eventsIn));
struct epoll_event* eventLast = eventsIn;
for (SocketList::iterator it = readList.begin(); it != readList.end(); ++it)
{
poco_socket_t sockfd = it->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
struct epoll_event* e = eventsIn;
for (; e != eventLast; ++e)
{
if (reinterpret_cast<Socket*>(e->data.ptr)->sockfd() == sockfd)
break;
}
if (e == eventLast)
{
e->data.ptr = &(*it);
++eventLast;
}
e->events |= EPOLLIN;
}
}
std::vector<epoll_event> eventsIn(epollSize);
struct epoll_event * eventLast = eventsIn.data();
for (SocketList::iterator it = readList.begin(); it != readList.end(); ++it)
{
poco_socket_t sockfd = it->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
struct epoll_event * e = eventsIn.data();
for (; e != eventLast; ++e)
{
if (reinterpret_cast<Socket *>(e->data.ptr)->sockfd() == sockfd)
break;
}
if (e == eventLast)
{
e->data.ptr = &(*it);
++eventLast;
}
e->events |= EPOLLIN;
}
}
for (SocketList::iterator it = writeList.begin(); it != writeList.end(); ++it)
{
poco_socket_t sockfd = it->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
struct epoll_event* e = eventsIn;
for (; e != eventLast; ++e)
{
if (reinterpret_cast<Socket*>(e->data.ptr)->sockfd() == sockfd)
break;
}
if (e == eventLast)
{
e->data.ptr = &(*it);
++eventLast;
}
e->events |= EPOLLOUT;
}
}
for (SocketList::iterator it = writeList.begin(); it != writeList.end(); ++it)
{
poco_socket_t sockfd = it->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
struct epoll_event * e = eventsIn.data();
for (; e != eventLast; ++e)
{
if (reinterpret_cast<Socket *>(e->data.ptr)->sockfd() == sockfd)
break;
}
if (e == eventLast)
{
e->data.ptr = &(*it);
++eventLast;
}
e->events |= EPOLLOUT;
}
}
for (SocketList::iterator it = exceptList.begin(); it != exceptList.end(); ++it)
{
poco_socket_t sockfd = it->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
struct epoll_event* e = eventsIn;
for (; e != eventLast; ++e)
{
if (reinterpret_cast<Socket*>(e->data.ptr)->sockfd() == sockfd)
break;
}
if (e == eventLast)
{
e->data.ptr = &(*it);
++eventLast;
}
e->events |= EPOLLERR;
}
}
for (SocketList::iterator it = exceptList.begin(); it != exceptList.end(); ++it)
{
poco_socket_t sockfd = it->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
struct epoll_event * e = eventsIn.data();
for (; e != eventLast; ++e)
{
if (reinterpret_cast<Socket *>(e->data.ptr)->sockfd() == sockfd)
break;
}
if (e == eventLast)
{
e->data.ptr = &(*it);
++eventLast;
}
e->events |= EPOLLERR;
}
}
epollSize = eventLast - eventsIn;
if (epollSize == 0) return 0;
epollSize = eventLast - eventsIn.data();
if (epollSize == 0)
return 0;
epollfd = epoll_create(1);
if (epollfd < 0)
{
SocketImpl::error("Can't create epoll queue");
}
epollfd = epoll_create(1);
if (epollfd < 0)
{
SocketImpl::error("Can't create epoll queue");
}
for (struct epoll_event* e = eventsIn; e != eventLast; ++e)
{
poco_socket_t sockfd = reinterpret_cast<Socket*>(e->data.ptr)->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, sockfd, e) < 0)
{
::close(epollfd);
SocketImpl::error("Can't insert socket to epoll queue");
}
}
}
}
for (struct epoll_event * e = eventsIn.data(); e != eventLast; ++e)
{
poco_socket_t sockfd = reinterpret_cast<Socket *>(e->data.ptr)->sockfd();
if (sockfd != POCO_INVALID_SOCKET)
{
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, sockfd, e) < 0)
{
::close(epollfd);
SocketImpl::error("Can't insert socket to epoll queue");
}
}
}
}
struct epoll_event eventsOut[epollSize];
memset(eventsOut, 0, sizeof(eventsOut));
std::vector<epoll_event> eventsOut(epollSize);
Poco::Timespan remainingTime(timeout);
int rc;
do
{
Poco::Timestamp start;
rc = epoll_wait(epollfd, eventsOut, epollSize, remainingTime.totalMilliseconds());
if (rc < 0 && SocketImpl::lastError() == POCO_EINTR)
{
Poco::Timestamp end;
Poco::Timespan waited = end - start;
if (waited < remainingTime)
remainingTime -= waited;
else
remainingTime = 0;
}
}
while (rc < 0 && SocketImpl::lastError() == POCO_EINTR);
Poco::Timespan remainingTime(timeout);
int rc;
do
{
Poco::Timestamp start;
rc = epoll_wait(epollfd, eventsOut.data(), epollSize, remainingTime.totalMilliseconds());
if (rc < 0 && SocketImpl::lastError() == POCO_EINTR)
{
Poco::Timestamp end;
Poco::Timespan waited = end - start;
if (waited < remainingTime)
remainingTime -= waited;
else
remainingTime = 0;
}
} while (rc < 0 && SocketImpl::lastError() == POCO_EINTR);
::close(epollfd);
if (rc < 0) SocketImpl::error();
::close(epollfd);
if (rc < 0)
SocketImpl::error();
SocketList readyReadList;
SocketList readyWriteList;
SocketList readyExceptList;
for (int n = 0; n < rc; ++n)
{
if (eventsOut[n].events & EPOLLERR)
readyExceptList.push_back(*reinterpret_cast<Socket*>(eventsOut[n].data.ptr));
if (eventsOut[n].events & EPOLLIN)
readyReadList.push_back(*reinterpret_cast<Socket*>(eventsOut[n].data.ptr));
if (eventsOut[n].events & EPOLLOUT)
readyWriteList.push_back(*reinterpret_cast<Socket*>(eventsOut[n].data.ptr));
}
std::swap(readList, readyReadList);
std::swap(writeList, readyWriteList);
std::swap(exceptList, readyExceptList);
return readList.size() + writeList.size() + exceptList.size();
SocketList readyReadList;
SocketList readyWriteList;
SocketList readyExceptList;
for (int n = 0; n < rc; ++n)
{
if (eventsOut[n].events & EPOLLERR)
readyExceptList.push_back(*reinterpret_cast<Socket *>(eventsOut[n].data.ptr));
if (eventsOut[n].events & EPOLLIN)
readyReadList.push_back(*reinterpret_cast<Socket *>(eventsOut[n].data.ptr));
if (eventsOut[n].events & EPOLLOUT)
readyWriteList.push_back(*reinterpret_cast<Socket *>(eventsOut[n].data.ptr));
}
std::swap(readList, readyReadList);
std::swap(writeList, readyWriteList);
std::swap(exceptList, readyExceptList);
return readList.size() + writeList.size() + exceptList.size();
#elif defined(POCO_HAVE_FD_POLL)
typedef Poco::SharedPtr<pollfd, Poco::ReferenceCounter, Poco::ReleaseArrayPolicy<pollfd> > SharedPollArray;

View File

@ -1161,6 +1161,7 @@ argMin
argmax
argmin
arrayAUC
arrayAUCPr
arrayAll
arrayAvg
arrayCompact

View File

@ -131,3 +131,10 @@ class Job:
def __repr__(self):
return self.name
def copy(self):
"""
To create an instant copy of a job config used in multiple workflows
:return: Job.Config
"""
return copy.deepcopy(self)

View File

@ -1,13 +1,16 @@
import copy
import importlib.util
from pathlib import Path
from typing import List
from praktika import Workflow
from . import Job
from .settings import Settings
from .utils import Utils
def _get_workflows(name=None, file=None):
def _get_workflows(name=None, file=None) -> List[Workflow.Config]:
"""
Gets user's workflow configs
"""

View File

@ -50,6 +50,7 @@ class WorkflowYaml:
artifact_to_config: Dict[str, ArtifactYaml]
secret_names_gh: List[str]
enable_cache: bool
cron_schedules: List[str]
class WorkflowConfigParser:
@ -75,6 +76,7 @@ class WorkflowConfigParser:
job_to_config={},
artifact_to_config={},
enable_cache=False,
cron_schedules=config.cron_schedules,
)
def parse(self):

View File

@ -1,5 +1,4 @@
import glob
import sys
from itertools import chain
from pathlib import Path
@ -21,6 +20,33 @@ class Validator:
cls.validate_requirements_txt_files(workflow)
cls.validate_dockers(workflow)
if workflow.event == Workflow.Event.SCHEDULE:
cls.evaluate_check(
workflow.cron_schedules
and isinstance(workflow.cron_schedules, list),
f".crone_schedules str must be non-empty list of cron strings .event===SCHEDULE, provided value [{workflow.cron_schedules}]",
workflow.name,
)
for cron_schedule in workflow.cron_schedules:
cls.evaluate_check(
len(cron_schedule.split(" ")) == 5,
f".crone_schedules must be posix compliant cron str, e.g. '30 15 * * *', provided value [{cron_schedule}]",
workflow.name,
)
for cron_token in cron_schedule.split(" ")[:-1]:
cls.evaluate_check(
cron_token == "*" or str.isdigit(cron_token),
f".crone_schedules must be posix compliant cron str, e.g. '30 15 * * 1,3', provided value [{cron_schedule}], invalid part [{cron_token}]",
workflow.name,
)
days_of_weak = cron_schedule.split(" ")[-1]
cls.evaluate_check(
days_of_weak == "*"
or any([str.isdigit(v) for v in days_of_weak.split(",")]),
f".crone_schedules must be posix compliant cron str, e.g. '30 15 * * 1,3', provided value [{cron_schedule}], invalid part [{days_of_weak}]",
workflow.name,
)
if workflow.artifacts:
for artifact in workflow.artifacts:
if artifact.is_s3_artifact():
@ -198,4 +224,4 @@ class Validator:
)
for message in messages:
print(" || " + message)
sys.exit(1)
raise

View File

@ -11,6 +11,8 @@ class Workflow:
class Event:
PULL_REQUEST = "pull_request"
PUSH = "push"
SCHEDULE = "schedule"
DISPATCH = "dispatch"
@dataclass
class Config:
@ -32,6 +34,7 @@ class Workflow:
enable_merge_ready_status: bool = False
enable_cidb: bool = False
enable_merge_commit: bool = False
cron_schedules: List[str] = field(default_factory=list)
def is_event_pull_request(self):
return self.event == Workflow.Event.PULL_REQUEST
@ -39,6 +42,9 @@ class Workflow:
def is_event_push(self):
return self.event == Workflow.Event.PUSH
def is_event_schedule(self):
return self.event == Workflow.Event.SCHEDULE
def get_job(self, name):
job = self.find_job(name)
if not job:

View File

@ -37,19 +37,13 @@ jobs:
{JOBS}\
"""
TEMPLATE_CALLABLE_WORKFLOW = """\
TEMPLATE_SCHEDULE = """\
# generated by praktika
name: {NAME}
on:
workflow_call:
inputs:
config:
type: string
required: false
default: ''
secrets:
{SECRETS}
schedule:{CRON_TEMPLATES}
workflow_dispatch:
env:
PYTHONUNBUFFERED: 1
@ -58,6 +52,10 @@ jobs:
{JOBS}\
"""
TEMPLATE_CRON = """
- cron: {CRON_SCHEDULE}\
"""
TEMPLATE_SECRET_CONFIG = """\
{SECRET_NAME}:
required: true
@ -88,9 +86,6 @@ jobs:
cat > {ENV_SETUP_SCRIPT} << 'ENV_SETUP_SCRIPT_EOF'
export PYTHONPATH=./ci:.
{SETUP_ENVS}
cat > {WORKFLOW_CONFIG_FILE} << 'EOF'
${{{{ needs.{WORKFLOW_CONFIG_JOB_NAME}.outputs.data }}}}
EOF
cat > {WORKFLOW_STATUS_FILE} << 'EOF'
${{{{ toJson(needs) }}}}
EOF
@ -119,6 +114,12 @@ jobs:
)\
"""
TEMPLATE_SETUP_ENV_WF_CONFIG = """\
cat > {WORKFLOW_CONFIG_FILE} << 'EOF'
${{{{ needs.{WORKFLOW_CONFIG_JOB_NAME}.outputs.data }}}}
EOF\
"""
TEMPLATE_PY_INSTALL = """
- name: Set up Python
uses: actions/setup-python@v5
@ -183,6 +184,7 @@ jobs:
if (
workflow_config.is_event_pull_request()
or workflow_config.is_event_push()
or workflow_config.is_event_schedule()
):
yaml_workflow_str = PullRequestPushYamlGen(parser).generate()
else:
@ -264,10 +266,18 @@ class PullRequestPushYamlGen:
SECRET_NAME=secret
)
)
if self.workflow_config.enable_cache:
secrets_envs.append(
YamlGenerator.Templates.TEMPLATE_SETUP_ENV_WF_CONFIG.format(
WORKFLOW_CONFIG_FILE=RunConfig.file_name_static(
self.workflow_config.name
),
WORKFLOW_CONFIG_JOB_NAME=config_job_name_normalized,
)
)
job_item = YamlGenerator.Templates.TEMPLATE_JOB_0.format(
JOB_NAME_NORMALIZED=job_name_normalized,
WORKFLOW_CONFIG_JOB_NAME=config_job_name_normalized,
IF_EXPRESSION=if_expression,
RUNS_ON=", ".join(job.runs_on),
NEEDS=needs,
@ -278,9 +288,6 @@ class PullRequestPushYamlGen:
WORKFLOW_NAME=self.workflow_config.name,
ENV_SETUP_SCRIPT=Settings.ENV_SETUP_SCRIPT,
SETUP_ENVS="\n".join(secrets_envs),
WORKFLOW_CONFIG_FILE=RunConfig.file_name_static(
self.workflow_config.name
),
JOB_ADDONS="".join(job_addons),
DOWNLOADS_GITHUB="\n".join(downloads_github),
UPLOADS_GITHUB="\n".join(uploads_github),
@ -293,14 +300,33 @@ class PullRequestPushYamlGen:
)
job_items.append(job_item)
base_template = YamlGenerator.Templates.TEMPLATE_PULL_REQUEST_0
# for schedule workflows only
cron_items = ""
for cron_item in self.workflow_config.cron_schedules:
cron_items += YamlGenerator.Templates.TEMPLATE_CRON.format(
CRON_SCHEDULE=cron_item
)
if self.workflow_config.event in (Workflow.Event.PULL_REQUEST,):
base_template = YamlGenerator.Templates.TEMPLATE_PULL_REQUEST_0
format_kwargs = {
"BRANCHES": ", ".join(
[f"'{branch}'" for branch in self.workflow_config.branches]
),
"EVENT": self.workflow_config.event,
}
elif self.workflow_config.event in (Workflow.Event.SCHEDULE,):
base_template = YamlGenerator.Templates.TEMPLATE_SCHEDULE
format_kwargs = {"CRON_TEMPLATES": cron_items}
else:
assert (
False
), f"Invalid or Not implemented event [{self.workflow_config.event}]"
template_1 = base_template.strip().format(
NAME=self.workflow_config.name,
BRANCHES=", ".join(
[f"'{branch}'" for branch in self.workflow_config.branches]
),
EVENT=self.workflow_config.event,
JOBS="{}" * len(job_items),
**format_kwargs,
)
res = template_1.format(*job_items)

View File

@ -375,7 +375,7 @@ ARTIFACTS = [
class Jobs:
style_check_job = Job.Config(
name=JobNames.STYLE_CHECK,
runs_on=[RunnerLabels.CI_SERVICES],
runs_on=[RunnerLabels.STYLE_CHECK_ARM],
command="python3 ./ci/jobs/check_style.py",
run_in_docker="clickhouse/style-test",
)

View File

@ -0,0 +1,16 @@
from praktika import Workflow
from ci.workflows.defs import Jobs
nightly_workflow = Workflow.Config(
name="PackagesRepoBakUp",
event=Workflow.Event.SCHEDULE,
jobs=[
Jobs.style_check_job,
],
cron_schedules=["13 3 * * *"],
)
WORKFLOWS = [
nightly_workflow,
]

View File

@ -9,7 +9,7 @@ workflow = Workflow.Config(
event=Workflow.Event.PULL_REQUEST,
base_branches=[BASE_BRANCH],
jobs=[
Jobs.style_check_job,
Jobs.style_check_job.copy(),
Jobs.fast_test_job,
*Jobs.build_jobs,
*Jobs.stateless_tests_jobs,

View File

@ -2,11 +2,11 @@
# NOTE: VERSION_REVISION has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54493)
SET(VERSION_MAJOR 24)
SET(VERSION_MINOR 12)
SET(VERSION_REVISION 54494)
SET(VERSION_MAJOR 25)
SET(VERSION_MINOR 1)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH e4c9b022237992620c966d032cee495da8d0b5ac)
SET(VERSION_DESCRIBE v24.12.1.1-testing)
SET(VERSION_STRING 24.12.1.1)
SET(VERSION_GITHASH 204fd5b54317ad310559805e6ebd7225cf014975)
SET(VERSION_DESCRIBE v25.1.1.1-testing)
SET(VERSION_STRING 25.1.1.1)
# end of autochange

View File

@ -18,19 +18,7 @@ elseif (COMPILER_CACHE STREQUAL "ccache")
elseif(COMPILER_CACHE STREQUAL "sccache")
find_program (CCACHE_EXECUTABLE sccache)
elseif(COMPILER_CACHE STREQUAL "chcache")
list (APPEND CMAKE_MODULE_PATH "${ClickHouse_SOURCE_DIR}/contrib/corrosion/cmake")
find_package(Rust REQUIRED)
include ("${ClickHouse_SOURCE_DIR}/contrib/corrosion/cmake/Corrosion.cmake")
corrosion_import_crate(
MANIFEST_PATH ${CMAKE_CURRENT_SOURCE_DIR}/utils/chcache/Cargo.toml
PROFILE release
LOCKED
FLAGS --offline
)
set_target_properties(chcache PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/programs/)
set(CCACHE_EXECUTABLE ${CMAKE_CURRENT_BINARY_DIR}/programs/chcache)
set(CCACHE_EXECUTABLE ${CMAKE_CURRENT_BINARY_DIR}/rust/chcache/chcache)
elseif(COMPILER_CACHE STREQUAL "disabled")
message(STATUS "Using *ccache: no (disabled via configuration)")
return()

View File

@ -19,6 +19,7 @@ endif ()
# We want to get everything out of the compiler for code quality.
add_warning(everything)
add_warning(pedantic)
add_warning(vla-cxx-extension)
no_warning(zero-length-array)
no_warning(c++98-compat-pedantic)
no_warning(c++98-compat)
@ -39,7 +40,6 @@ no_warning(padded)
no_warning(switch-enum)
no_warning(undefined-func-template)
no_warning(unused-template)
no_warning(vla)
no_warning(weak-template-vtables)
no_warning(weak-vtables)
no_warning(thread-safety-negative) # experimental flag, too many false positives

2
contrib/arrow vendored

@ -1 +1 @@
Subproject commit 6e2574f5013a005c050c9a7787d341aef09d0063
Subproject commit ae9f3d6a2f4e5c3fb99b52cb471e1921f5e66495

View File

@ -1,4 +1,4 @@
if (OS_LINUX)
if (OS_LINUX OR OS_DARWIN)
option(ENABLE_ICU "Enable ICU" ${ENABLE_LIBRARIES})
else ()
option(ENABLE_ICU "Enable ICU" 0)
@ -476,11 +476,14 @@ set(ICUI18N_SOURCES
file(GENERATE OUTPUT "${CMAKE_CURRENT_BINARY_DIR}/empty.cpp" CONTENT " ")
enable_language(ASM)
if (ARCH_S390X)
set(ICUDATA_SOURCE_FILE "${ICUDATA_SOURCE_DIR}/icudt75b_dat.S" )
else()
set(ICUDATA_SOURCE_FILE "${ICUDATA_SOURCE_DIR}/icudt75l_dat.S" )
endif()
if (OS_DARWIN)
# Fine for both x86 and ARM
set(ICUDATA_SOURCE_FILE "${ICUDATA_SOURCE_DIR}/darwin_x86_64/icudt75l_dat.S")
elseif (ARCH_S390X)
set(ICUDATA_SOURCE_FILE "${ICUDATA_SOURCE_DIR}/icudt75b_dat.S")
else ()
set(ICUDATA_SOURCE_FILE "${ICUDATA_SOURCE_DIR}/icudt75l_dat.S")
endif ()
# ^^ you might be confused how for different little endian platforms (x86, ARM) the same assembly files can be used.
# These files are indeed assembly but they only contain data ('.long' directive), which makes them portable accross CPUs.
# Only the endianness and the character set (ASCII, EBCDIC) makes a difference, also see

2
contrib/icudata vendored

@ -1 +1 @@
Subproject commit 4904951339a70b4814d2d3723436b20d079cb01b
Subproject commit cfc05b4c3140ff2be84291b80de8c62b1e42d0da

2
contrib/rust_vendor vendored

@ -1 +1 @@
Subproject commit b25b16b0b10a1cbb33eb0922f77aeedb72119792
Subproject commit 4214a61e00b17eefd18c9540a43610918347816b

View File

@ -13,8 +13,9 @@ else
fi
# The repo is usually mounted to /ClickHouse
LANGUAGES=$(grep -o "'[/][a-z][a-z]'" /opt/clickhouse-docs/docusaurus.config.js | sort -u | sed "s/'\/\([a-z][a-z]\)'/\1/")
for lang in en ru zh
for lang in $LANGUAGES
do
if [ -d "/ClickHouse/docs/${lang}" ]; then
cp -rf "/ClickHouse/docs/${lang}" "/opt/clickhouse-docs/docs/"

View File

@ -36,7 +36,7 @@ geomet==0.2.1.post1
grpcio-tools==1.60.0
grpcio==1.60.0
gssapi==1.8.3
httplib2==0.20.2
httplib2==0.22.0
idna==3.7
importlib-metadata==4.6.4
iniconfig==2.0.0
@ -72,7 +72,7 @@ pyarrow==17.0.0
pycparser==2.22
pycryptodome==3.20.0
pymongo==3.11.0
pyparsing==2.4.7
pyparsing==3.1.0
pyspark==3.3.2
pyspnego==0.10.2
pytest-order==1.0.0
@ -101,4 +101,5 @@ wadllib==1.3.6
websocket-client==1.8.0
wheel==0.38.1
zipp==1.0.0
pyiceberg==0.7.1
jinja2==3.1.3

View File

@ -50,3 +50,4 @@ urllib3==1.26.5
wadllib==1.3.6
wheel==0.37.1
zipp==1.0.0
clickhouse-driver==0.2.7

View File

@ -2,7 +2,7 @@ DirectoryPath: /test
IgnoreDirectoryMissingTrailingSlash: true
CheckExternal: false
CheckInternal: false
CheckInternalHash: true
CheckInternalHash: false
CheckMailto: false
IgnoreAltMissing: true
IgnoreEmptyHref: true

View File

@ -65,7 +65,7 @@ sidebar_label: 2024
* Follow-up to https://github.com/ClickHouse/ClickHouse/pull/69346 Point 4 described there will work now as well:. [#69563](https://github.com/ClickHouse/ClickHouse/pull/69563) ([Vitaly Baranov](https://github.com/vitlibar)).
* Implement generic SerDe between Avro Union and ClickHouse Variant type. Resolves [#69713](https://github.com/ClickHouse/ClickHouse/issues/69713). [#69712](https://github.com/ClickHouse/ClickHouse/pull/69712) ([Jiří Kozlovský](https://github.com/jirislav)).
* 1. CREATE TABLE AS will copy PRIMARY KEY, ORDER BY, and similar clauses. Now it is supported only for the MergeTree family of table engines. 2. For example, the follow SQL statements will trigger exception in the past, but this PR fixes it: if the destination table do not provide an `ORDER BY` or `PRIMARY KEY` expression in the table definition, we will copy that from source table. [#69739](https://github.com/ClickHouse/ClickHouse/pull/69739) ([sakulali](https://github.com/sakulali)).
* Added user-level settings `min_free_disk_bytes_to_throw_insert` and `min_free_disk_ratio_to_throw_insert` to prevent insertions on disks that are almost full. [#69755](https://github.com/ClickHouse/ClickHouse/pull/69755) ([Marco Vilas Boas](https://github.com/marco-vb)).
* Added user-level settings `min_free_disk_bytes_to_perform_insert` and `min_free_disk_ratio_to_perform_insert` to prevent insertions on disks that are almost full. [#69755](https://github.com/ClickHouse/ClickHouse/pull/69755) ([Marco Vilas Boas](https://github.com/marco-vb)).
* If you run `clickhouse-client` or other CLI application and it starts up slowly due to an overloaded server, and you start typing your query, such as `SELECT`, the previous versions will display the remaining of the terminal echo contents before printing the greetings message, such as `SELECTClickHouse local version 24.10.1.1.` instead of `ClickHouse local version 24.10.1.1.`. Now it is fixed. This closes [#31696](https://github.com/ClickHouse/ClickHouse/issues/31696). [#69856](https://github.com/ClickHouse/ClickHouse/pull/69856) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add new column readonly_duration to the system.replicas table. Needed to be able to distinguish actual readonly replicas from sentinel ones in alerts. [#69871](https://github.com/ClickHouse/ClickHouse/pull/69871) ([Miсhael Stetsyuk](https://github.com/mstetsyuk)).
* Change the join to sort settings type to unsigned int. [#69886](https://github.com/ClickHouse/ClickHouse/pull/69886) ([kevinyhzou](https://github.com/KevinyhZou)).

View File

@ -89,7 +89,7 @@ sidebar_label: 2024
* Restore mode that replaces all external table engines and functions to Null (`restore_replace_external_engines_to_null`, `restore_replace_external_table_functions_to_null` settings) was failing if table had SETTINGS. Now it removes settings from table definition in this case and allows to restore such tables. [#69253](https://github.com/ClickHouse/ClickHouse/pull/69253) ([Ilya Yatsishin](https://github.com/qoega)).
* Reduce memory usage of inserts to JSON by using adaptive write buffer size. A lot of files created by JSON column in wide part contains small amount of data and it doesn't make sense to allocate 1MB buffer for them. [#69272](https://github.com/ClickHouse/ClickHouse/pull/69272) ([Pavel Kruglov](https://github.com/Avogar)).
* CLICKHOUSE_PASSWORD is escaped for XML in clickhouse image's entrypoint. [#69301](https://github.com/ClickHouse/ClickHouse/pull/69301) ([aohoyd](https://github.com/aohoyd)).
* Added user-level settings `min_free_disk_bytes_to_throw_insert` and `min_free_disk_ratio_to_throw_insert` to prevent insertions on disks that are almost full. [#69376](https://github.com/ClickHouse/ClickHouse/pull/69376) ([Marco Vilas Boas](https://github.com/marco-vb)).
* Added user-level settings `min_free_disk_bytes_to_perform_insert` and `min_free_disk_ratio_to_perform_insert` to prevent insertions on disks that are almost full. [#69376](https://github.com/ClickHouse/ClickHouse/pull/69376) ([Marco Vilas Boas](https://github.com/marco-vb)).
* Not retaining thread in concurrent hash join threadpool to avoid query excessively spawn threads. [#69406](https://github.com/ClickHouse/ClickHouse/pull/69406) ([Duc Canh Le](https://github.com/canhld94)).
* Allow empty arguments for arrayZip/arrayZipUnaligned, as concat did in https://github.com/ClickHouse/ClickHouse/pull/65887. It is for spark compatiability in Gluten CH Backend. [#69576](https://github.com/ClickHouse/ClickHouse/pull/69576) ([李扬](https://github.com/taiyang-li)).
* Support more advanced SSL options for Keeper's internal communication (e.g. private keys with passphrase). [#69582](https://github.com/ClickHouse/ClickHouse/pull/69582) ([Antonio Andelic](https://github.com/antonio2368)).
@ -199,7 +199,7 @@ sidebar_label: 2024
* NO CL ENTRY: 'Revert "Fix prewhere without columns and without adaptive index granularity (almost w/o anything)"'. [#68897](https://github.com/ClickHouse/ClickHouse/pull/68897) ([Alexander Gololobov](https://github.com/davenger)).
* NO CL ENTRY: 'Revert "Speed up some Kafka tests with multiprocessing"'. [#69356](https://github.com/ClickHouse/ClickHouse/pull/69356) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* NO CL ENTRY: 'Revert "Remove obsolete `--multiquery` parameter (follow-up to [#63898](https://github.com/ClickHouse/ClickHouse/issues/63898)), pt. V"'. [#69393](https://github.com/ClickHouse/ClickHouse/pull/69393) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Add user-level settings min_free_diskspace_bytes_to_throw_insert and min_free_diskspace_ratio_to_throw_insert"'. [#69705](https://github.com/ClickHouse/ClickHouse/pull/69705) ([Raúl Marín](https://github.com/Algunenano)).
* NO CL ENTRY: 'Revert "Add user-level settings min_free_disk_bytes_to_perform_insert and min_free_disk_ratio_to_perform_insert"'. [#69705](https://github.com/ClickHouse/ClickHouse/pull/69705) ([Raúl Marín](https://github.com/Algunenano)).
* NO CL ENTRY: 'Revert "Support more oss endpoints"'. [#69779](https://github.com/ClickHouse/ClickHouse/pull/69779) ([Raúl Marín](https://github.com/Algunenano)).
#### NOT FOR CHANGELOG / INSIGNIFICANT

View File

@ -24,19 +24,24 @@ CREATE TABLE test (name String, value UInt32)
`AzureQueue` parameters are the same as `AzureBlobStorage` table engine supports. See parameters section [here](../../../engines/table-engines/integrations/azureBlobStorage.md).
Similar to the [AzureBlobStorage](/docs/en/engines/table-engines/integrations/azureBlobStorage) table engine, users can use Azurite emulator for local Azure Storage development. Further details [here](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=docker-hub%2Cblob-storage).
**Example**
```sql
CREATE TABLE azure_queue_engine_table (name String, value UInt32)
ENGINE=AzureQueue('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/data/')
SETTINGS
mode = 'unordered'
CREATE TABLE azure_queue_engine_table
(
`key` UInt64,
`data` String
)
ENGINE = AzureQueue('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;', 'testcontainer', '*', 'CSV')
SETTINGS mode = 'unordered'
```
## Settings {#settings}
The set of supported settings is the same as for `S3Queue` table engine, but without `s3queue_` prefix. See [full list of settings settings](../../../engines/table-engines/integrations/s3queue.md#settings).
To get a list of settings, configured for the table, use `system.s3_queue_settings` table. Available from `24.10`.
To get a list of settings, configured for the table, use `system.azure_queue_settings` table. Available from `24.10`.
## Description {#description}
@ -51,18 +56,18 @@ When the `MATERIALIZED VIEW` joins the engine, it starts collecting data in the
Example:
``` sql
CREATE TABLE azure_queue_engine_table (name String, value UInt32)
ENGINE=AzureQueue('<endpoint>', 'CSV', 'gzip')
SETTINGS
mode = 'unordered';
CREATE TABLE azure_queue_engine_table (key UInt64, data String)
ENGINE=AzureQueue('<endpoint>', 'CSV', 'gzip')
SETTINGS
mode = 'unordered';
CREATE TABLE stats (name String, value UInt32)
ENGINE = MergeTree() ORDER BY name;
CREATE TABLE stats (key UInt64, data String)
ENGINE = MergeTree() ORDER BY key;
CREATE MATERIALIZED VIEW consumer TO stats
AS SELECT name, value FROM azure_queue_engine_table;
CREATE MATERIALIZED VIEW consumer TO stats
AS SELECT key, data FROM azure_queue_engine_table;
SELECT * FROM stats ORDER BY name;
SELECT * FROM stats ORDER BY key;
```
## Virtual columns {#virtual-columns}
@ -71,3 +76,77 @@ Example:
- `_file` — Name of the file.
For more information about virtual columns see [here](../../../engines/table-engines/index.md#table_engines-virtual_columns).
## Introspection
Enable logging for the table via the table setting `enable_logging_to_s3queue_log=1`.
Introspection capabilities are the same as the [S3Queue table engine](/docs/en/engines/table-engines/integrations/s3queue#introspection) with several distinct differences:
1. Use the `system.s3queue` for the in-memory state of the queue. Later versions of ClickHouse may introduce a dedicated `azurequeue` table.
2. Enable the `system.azure_queue_log` via the main ClickHouse configuration e.g.
```xml
<azure_queue_log>
<database>system</database>
<table>azure_queue_log</table>
</azure_queue_log>
```
This persistent table has the same information as `system.s3queue`, but for processed and failed files.
The table has the following structure:
```sql
CREATE TABLE system.azure_queue_log
(
`hostname` LowCardinality(String) COMMENT 'Hostname',
`event_date` Date COMMENT 'Event date of writing this log row',
`event_time` DateTime COMMENT 'Event time of writing this log row',
`database` String COMMENT 'The name of a database where current S3Queue table lives.',
`table` String COMMENT 'The name of S3Queue table.',
`uuid` String COMMENT 'The UUID of S3Queue table',
`file_name` String COMMENT 'File name of the processing file',
`rows_processed` UInt64 COMMENT 'Number of processed rows',
`status` Enum8('Processed' = 0, 'Failed' = 1) COMMENT 'Status of the processing file',
`processing_start_time` Nullable(DateTime) COMMENT 'Time of the start of processing the file',
`processing_end_time` Nullable(DateTime) COMMENT 'Time of the end of processing the file',
`exception` String COMMENT 'Exception message if happened'
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, event_time)
SETTINGS index_granularity = 8192
COMMENT 'Contains logging entries with the information files processes by S3Queue engine.'
```
Example:
```sql
SELECT *
FROM system.azure_queue_log
LIMIT 1
FORMAT Vertical
Row 1:
──────
hostname: clickhouse
event_date: 2024-12-16
event_time: 2024-12-16 13:42:47
database: default
table: azure_queue_engine_table
uuid: 1bc52858-00c0-420d-8d03-ac3f189f27c8
file_name: test_1.csv
rows_processed: 3
status: Processed
processing_start_time: 2024-12-16 13:42:47
processing_end_time: 2024-12-16 13:42:47
exception:
1 row in set. Elapsed: 0.002 sec.
```

View File

@ -31,10 +31,12 @@ CREATE TABLE azure_blob_storage_table (name String, value UInt32)
**Example**
Users can use the Azurite emulator for local Azure Storage development. Further details [here](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=docker-hub%2Cblob-storage). If using a local instance of Azurite, users may need to substitute `http://localhost:10000` for `http://azurite1:10000` in the commands below, where we assume Azurite is available at host `azurite1`.
``` sql
CREATE TABLE test_table (key UInt64, data String)
ENGINE = AzureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;',
'test_container', 'test_table', 'CSV');
ENGINE = AzureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;', 'testcontainer', 'test_table', 'CSV');
INSERT INTO test_table VALUES (1, 'a'), (2, 'b'), (3, 'c');
@ -73,7 +75,7 @@ To enable caching use a setting `filesystem_cache_name = '<name>'` and `enable_f
```sql
SELECT *
FROM azureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;', 'test_container', 'test_table', 'CSV')
FROM azureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;', 'testcontainer', 'test_table', 'CSV')
SETTINGS filesystem_cache_name = 'cache_for_azure', enable_filesystem_cache = 1;
```

View File

@ -16,6 +16,10 @@ Engines of the family:
`Log` family table engines can store data to [HDFS](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-hdfs) or [S3](/docs/en/engines/table-engines/mergetree-family/mergetree.md/#table_engine-mergetree-s3) distributed file systems.
:::warning This engine is not for log data.
Despite the name, *Log table engines are not meant for the storage of log data. They should only be used for small volumes which need to be written quickly.
:::
## Common Properties {#common-properties}
Engines:

View File

@ -11,3 +11,88 @@ The engine belongs to the family of `Log` engines. See the common properties of
`Log` differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads.
For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other.
The `Log` engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The `Log` engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes.
## Creating a Table {#table_engines-log-creating-a-table}
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
column1_name [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
column2_name [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = Log
```
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
## Writing the Data {#table_engines-log-writing-the-data}
The `Log` engine efficiently stores data by writing each column to its own file. For every table, the Log engine writes the following files to the specified storage path:
- `<column>.bin`: A data file for each column, containing the serialized and compressed data.
`__marks.mrk`: A marks file, storing offsets and row counts for each data block inserted. Marks are used to facilitate efficient query execution by allowing the engine to skip irrelevant data blocks during reads.
### Writing Process
When data is written to a `Log` table:
1. Data is serialized and compressed into blocks.
2. For each column, the compressed data is appended to its respective `<column>.bin` file.
3. Corresponding entries are added to the `__marks.mrk` file to record the offset and row count of the newly inserted data.
## Reading the Data {#table_engines-log-reading-the-data}
The file with marks allows ClickHouse to parallelize the reading of data. This means that a `SELECT` query returns rows in an unpredictable order. Use the `ORDER BY` clause to sort rows.
## Example of Use {#table_engines-log-example-of-use}
Creating a table:
``` sql
CREATE TABLE log_table
(
timestamp DateTime,
message_type String,
message String
)
ENGINE = Log
```
Inserting data:
``` sql
INSERT INTO log_table VALUES (now(),'REGULAR','The first regular message')
INSERT INTO log_table VALUES (now(),'REGULAR','The second regular message'),(now(),'WARNING','The first warning message')
```
We used two `INSERT` queries to create two data blocks inside the `<column>.bin` files.
ClickHouse uses multiple threads when selecting data. Each thread reads a separate data block and returns resulting rows independently as it finishes. As a result, the order of blocks of rows in the output may not match the order of the same blocks in the input. For example:
``` sql
SELECT * FROM log_table
```
``` text
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
└─────────────────────┴──────────────┴────────────────────────────┘
┌───────────timestamp─┬─message_type─┬─message───────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
└─────────────────────┴──────────────┴───────────────────────────┘
```
Sorting the results (ascending order by default):
``` sql
SELECT * FROM log_table ORDER BY timestamp
```
``` text
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
└─────────────────────┴──────────────┴────────────────────────────┘
```

View File

@ -4,11 +4,11 @@ toc_priority: 32
toc_title: StripeLog
---
# Stripelog
# StripeLog
This engine belongs to the family of log engines. See the common properties of log engines and their differences in the [Log Engine Family](../../../engines/table-engines/log-family/index.md) article.
Use this engine in scenarios when you need to write many tables with a small amount of data (less than 1 million rows).
Use this engine in scenarios when you need to write many tables with a small amount of data (less than 1 million rows). For example, this table can be used to store incoming data batches for transformation where atomic processing of them is required. 100k instances of this table type are viable for a ClickHouse server. This table engine should be preferred over [Log](./log.md) when a high number of tables are required. This is at the expense of read efficiency.
## Creating a Table {#table_engines-stripelog-creating-a-table}

View File

@ -11,3 +11,71 @@ The engine belongs to the log engine family. See [Log Engine Family](../../../en
This table engine is typically used with the write-once method: write data one time, then read it as many times as necessary. For example, you can use `TinyLog`-type tables for intermediary data that is processed in small batches. Note that storing data in a large number of small tables is inefficient.
Queries are executed in a single stream. In other words, this engine is intended for relatively small tables (up to about 1,000,000 rows). It makes sense to use this table engine if you have many small tables, since its simpler than the [Log](../../../engines/table-engines/log-family/log.md) engine (fewer files need to be opened).
## Characteristics
- **Simpler Structure**: Unlike the Log engine, TinyLog does not use mark files. This reduces complexity but also limits performance optimizations for large datasets.
- **Single Stream Queries**: Queries on TinyLog tables are executed in a single stream, making it suitable for relatively small tables, typically up to 1,000,000 rows.
- **Efficient for Small Table**: The simplicity of the TinyLog engine makes it advantageous when managing many small tables, as it requires fewer file operations compared to the Log engine.
Unlike the Log engine, TinyLog does not use mark files. This reduces complexity but also limits performance optimizations for larger datasets.
## Creating a Table {#table_engines-tinylog-creating-a-table}
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
column1_name [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
column2_name [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = TinyLog
```
See the detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
## Writing the Data {#table_engines-tinylog-writing-the-data}
The `TinyLog` engine stores all the columns in one file. For each `INSERT` query, ClickHouse appends the data block to the end of a table file, writing columns one by one.
For each table ClickHouse writes the files:
- `<column>.bin`: A data file for each column, containing the serialized and compressed data.
The `TinyLog` engine does not support the `ALTER UPDATE` and `ALTER DELETE` operations.
## Example of Use {#table_engines-tinylog-example-of-use}
Creating a table:
``` sql
CREATE TABLE tiny_log_table
(
timestamp DateTime,
message_type String,
message String
)
ENGINE = TinyLog
```
Inserting data:
``` sql
INSERT INTO tiny_log_table VALUES (now(),'REGULAR','The first regular message')
INSERT INTO tiny_log_table VALUES (now(),'REGULAR','The second regular message'),(now(),'WARNING','The first warning message')
```
We used two `INSERT` queries to create two data blocks inside the `<column>.bin` files.
ClickHouse uses a single stream selecting data. As a result, the order of blocks of rows in the output matches the order of the same blocks in the input. For example:
``` sql
SELECT * FROM tiny_log_table
```
``` text
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2024-12-10 13:11:58 │ REGULAR │ The first regular message │
│ 2024-12-10 13:12:12 │ REGULAR │ The second regular message │
│ 2024-12-10 13:12:12 │ WARNING │ The first warning message │
└─────────────────────┴──────────────┴────────────────────────────┘
```

View File

@ -121,7 +121,6 @@ FROM table
WHERE ... -- WHERE clause is optional
ORDER BY Distance(vectors, reference_vector)
LIMIT N
SETTINGS enable_analyzer = 0; -- Temporary limitation, will be lifted
```
To search using a different value of HNSW parameter `hnsw_candidate_list_size_for_search` (default: 256), also known as `ef_search` in the

View File

@ -21,43 +21,6 @@ As of November 8th, 2022, each TSV is approximately the following size and numbe
- `file_changes` - 53M - 266,051 rows
- `line_changes` - 2.7G - 7,535,157 rows
# Table of Contents
- [Table of Contents](#table-of-contents)
- [Generating the data](#generating-the-data)
- [Downloading and inserting the data](#downloading-and-inserting-the-data)
- [Queries](#queries)
- [History of a single file](#history-of-a-single-file)
- [Find the current active files](#find-the-current-active-files)
- [List files with most modifications](#list-files-with-most-modifications)
- [What day of the week do commits usually occur?](#what-day-of-the-week-do-commits-usually-occur)
- [History of subdirectory/file - number of lines, commits and contributors over time](#history-of-subdirectoryfile---number-of-lines-commits-and-contributors-over-time)
- [List files with maximum number of authors](#list-files-with-maximum-number-of-authors)
- [Oldest lines of code in the repository](#oldest-lines-of-code-in-the-repository)
- [Files with longest history](#files-with-longest-history)
- [Distribution of contributors with respect to docs and code over the month](#distribution-of-contributors-with-respect-to-docs-and-code-over-the-month)
- [Authors with the most diverse impact](#authors-with-the-most-diverse-impact)
- [Favorite files for an author](#favorite-files-for-an-author)
- [Largest files with lowest number of authors](#largest-files-with-lowest-number-of-authors)
- [Commits and lines of code distribution by time; by weekday, by author; for specific subdirectories](#commits-and-lines-of-code-distribution-by-time-by-weekday-by-author-for-specific-subdirectories)
- [Matrix of authors that shows what authors tends to rewrite another authors code](#matrix-of-authors-that-shows-what-authors-tends-to-rewrite-another-authors-code)
- [Who is the highest percentage contributor per day of week?](#who-is-the-highest-percentage-contributor-per-day-of-week)
- [Distribution of code age across repository](#distribution-of-code-age-across-repository)
- [What percentage of code for an author has been removed by other authors?](#what-percentage-of-code-for-an-author-has-been-removed-by-other-authors)
- [List files that were rewritten most number of times?](#list-files-that-were-rewritten-most-number-of-times)
- [What weekday does the code have the highest chance to stay in the repository?](#what-weekday-does-the-code-have-the-highest-chance-to-stay-in-the-repository)
- [Files sorted by average code age](#files-sorted-by-average-code-age)
- [Who tends to write more tests / CPP code / comments?](#who-tends-to-write-more-tests--cpp-code--comments)
- [How does an authors commits change over time with respect to code/comments percentage?](#how-does-an-authors-commits-change-over-time-with-respect-to-codecomments-percentage)
- [What is the average time before code will be rewritten and the median (half-life of code decay)?](#what-is-the-average-time-before-code-will-be-rewritten-and-the-median-half-life-of-code-decay)
- [What is the worst time to write code in sense that the code has highest chance to be re-written?](#what-is-the-worst-time-to-write-code-in-sense-that-the-code-has-highest-chance-to-be-re-written)
- [Which authors code is the most sticky?](#which-authors-code-is-the-most-sticky)
- [Most consecutive days of commits by an author](#most-consecutive-days-of-commits-by-an-author)
- [Line by line commit history of a file](#line-by-line-commit-history-of-a-file)
- [Unsolved Questions](#unsolved-questions)
- [Git blame](#git-blame)
- [Related Content](#related-content)
# Generating the data
This is optional. We distribute the data freely - see [Downloading and inserting the data](#downloading-and-inserting-the-data).

View File

@ -998,46 +998,6 @@ WHERE
);
```
::::note
As of October 2024, the query is extremely slow due to missing join predicate pushdown. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/70802
This alternative formulation works and was verified to return the reference results.
```sql
SELECT
sum(l_extendedprice * (1 - l_discount)) AS revenue
FROM
lineitem,
part
WHERE
p_partkey = l_partkey
AND l_shipinstruct = 'DELIVER IN PERSON'
AND l_shipmode IN ('AIR', 'AIR REG')
AND (
(
p_brand = 'Brand#12'
AND p_container IN ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
AND l_quantity >= 1 AND l_quantity <= 1 + 10
AND p_size BETWEEN 1 AND 5
)
OR
(
p_brand = 'Brand#23'
AND p_container IN ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
AND l_quantity >= 10 AND l_quantity <= 10 + 10
AND p_size BETWEEN 1 AND 10
)
OR
(
p_brand = 'Brand#34'
AND p_container IN ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
AND l_quantity >= 20 AND l_quantity <= 20 + 10
AND p_size BETWEEN 1 AND 15
)
)
```
::::
**Q20**
```sql

View File

@ -93,8 +93,8 @@ It is recommended to use official pre-compiled `deb` packages for Debian or Ubun
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main" | sudo tee \
/etc/apt/sources.list.d/clickhouse.list
ARCH=$(dpkg --print-architecture)
echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg arch=${ARCH}] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update
```

View File

@ -490,7 +490,23 @@ AzureBlobStorage('<connection string>/<url>', '<container>', '<path>', '<account
```sql
BACKUP TABLE data TO AzureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;',
'test_container', 'data_backup');
'testcontainer', 'data_backup');
RESTORE TABLE data AS data_restored FROM AzureBlobStorage('DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite1:10000/devstoreaccount1/;',
'test_container', 'data_backup');
'testcontainer', 'data_backup');
```
## Backup up system tables
System tables can also be included in your backup and restore workflows, but their inclusion depends on your specific use case.
### Backing Up Log Tables
System tables that store historic data, such as those with a _log suffix (e.g., `query_log`, `part_log`), can be backed up and restored like any other table. If your use case relies on analyzing historic data—for example, using query_log to track query performance or debug issues—its recommended to include these tables in your backup strategy. However, if historic data from these tables is not required, they can be excluded to save backup storage space.
### Backing Up Access Management Tables
System tables related to access management, such as users, roles, row_policies, settings_profiles, and quotas, receive special treatment during backup and restore operations. When these tables are included in a backup, their content is exported to a special `accessXX.txt` file, which encapsulates the equivalent SQL statements for creating and configuring the access entities. Upon restoration, the restore process interprets these files and re-applies the SQL commands to recreate the users, roles, and other configurations.
This feature ensures that the access control configuration of a ClickHouse cluster can be backed up and restored as part of the clusters overall setup.
Note: This functionality only works for configurations managed through SQL commands (referred to as ["SQL-driven Access Control and Account Management"](/docs/en/operations/access-rights#enabling-access-control)). Access configurations defined in ClickHouse server configuration files (e.g. `users.xml`) are not included in backups and cannot be restored through this method.

View File

@ -1078,7 +1078,7 @@ Default value: throw
## min_free_disk_bytes_to_perform_insert
The minimum number of bytes that should be free in disk space in order to insert data. If the number of available free bytes is less than `min_free_disk_bytes_to_throw_insert` then an exception is thrown and the insert is not executed. Note that this setting:
The minimum number of bytes that should be free in disk space in order to insert data. If the number of available free bytes is less than `min_free_disk_bytes_to_perform_insert` then an exception is thrown and the insert is not executed. Note that this setting:
- takes into account the `keep_free_space_bytes` setting.
- does not take into account the amount of data that will be written by the `INSERT` operation.
- is only checked if a positive (non-zero) number of bytes is specified
@ -1106,6 +1106,31 @@ Default value: 0.0
Note that if both `min_free_disk_ratio_to_perform_insert` and `min_free_disk_bytes_to_perform_insert` are specified, ClickHouse will count on the value that will allow to perform inserts on a bigger amount of free memory.
## allow_experimental_reverse_key
Enables support for descending sort order in MergeTree sorting keys. This setting is particularly useful for time series analysis and Top-N queries, allowing data to be stored in reverse chronological order to optimize query performance.
With `allow_experimental_reverse_key` enabled, you can define descending sort orders within the `ORDER BY` clause of a MergeTree table. This enables the use of more efficient `ReadInOrder` optimizations instead of `ReadInReverseOrder` for descending queries.
**Example**
```sql
CREATE TABLE example
(
time DateTime,
key Int32,
value String
) ENGINE = MergeTree
ORDER BY (time DESC, key) -- Descending order on 'time' field
SETTINGS allow_experimental_reverse_key = 1;
SELECT * FROM example WHERE key = 'xxx' ORDER BY time DESC LIMIT 10;
```
By using `ORDER BY time DESC` in the query, `ReadInOrder` is applied.
**Default Value:** false
## cache_populated_by_fetch
A Cloud only setting.

View File

@ -11,7 +11,7 @@ sidebar_label: List of tools and utilities
- [clickhouse-format](../../operations/utilities/clickhouse-format.md) — Enables formatting input queries.
- [ClickHouse obfuscator](../../operations/utilities/clickhouse-obfuscator.md) — Obfuscates data.
- [ClickHouse compressor](../../operations/utilities/clickhouse-compressor.md) — Compresses and decompresses data.
- [clickhouse-disks](../../operations/utilities/clickhouse-disks.md) -- Provides filesystem-like operations
- [clickhouse-disks](../../operations/utilities/clickhouse-disks.md) Provides filesystem-like operations
on files among different ClickHouse disks.
- [clickhouse-odbc-bridge](../../operations/utilities/odbc-bridge.md) — A proxy server for ODBC driver.
- [clickhouse_backupview](../../operations/utilities/backupview.md) — A python module to analyze ClickHouse backups.

View File

@ -7,7 +7,7 @@ keywords: [object, data type]
# Object Data Type (deprecated)
**This feature is not production-ready and deprecated.** If you need to work with JSON documents, consider using [this guide](/docs/en/integrations/data-formats/json/overview) instead. A new implementation to support JSON object is in progress and can be tracked [here](https://github.com/ClickHouse/ClickHouse/issues/54864).
**This feature is not production-ready and deprecated.** If you need to work with JSON documents, consider using [this guide](/docs/en/integrations/data-formats/json/overview) instead. A new implementation to support JSON object is in Beta. Further details [here](/docs/en/sql-reference/data-types/newjson).
<hr />

View File

@ -694,6 +694,39 @@ SELECT json, json.a, json.b, json.c FROM test;
└──────────────────────────────┴────────┴─────────┴────────────┘
```
## Comparison between values of the JSON type
Values of the `JSON` column cannot be compared by `less/greater` functions, but can be compared using `equal` function.
Two JSON objects considered equal when they have the same set of paths and value of each path have the same type and value in both objects.
Example:
```sql
CREATE TABLE test (json1 JSON(a UInt32), json2 JSON(a UInt32)) ENGINE=Memory;
INSERT INTO test FORMAT JSONEachRow
{"json1" : {"a" : 42, "b" : 42, "c" : "Hello"}, "json2" : {"a" : 42, "b" : 42, "c" : "Hello"}}
{"json1" : {"a" : 42, "b" : 42, "c" : "Hello"}, "json2" : {"a" : 43, "b" : 42, "c" : "Hello"}}
{"json1" : {"a" : 42, "b" : 42, "c" : "Hello"}, "json2" : {"a" : 43, "b" : 42, "c" : "Hello"}}
{"json1" : {"a" : 42, "b" : 42, "c" : "Hello"}, "json2" : {"a" : 42, "b" : 42, "c" : "World"}}
{"json1" : {"a" : 42, "b" : [1, 2, 3], "c" : "Hello"}, "json2" : {"a" : 42, "b" : 42, "c" : "Hello"}}
{"json1" : {"a" : 42, "b" : 42.0, "c" : "Hello"}, "json2" : {"a" : 42, "b" : 42, "c" : "Hello"}}
{"json1" : {"a" : 42, "b" : "42", "c" : "Hello"}, "json2" : {"a" : 42, "b" : 42, "c" : "Hello"}};
SELECT json1, json2, json1 == json2 FROM test;
```
```text
┌─json1──────────────────────────────────┬─json2─────────────────────────┬─equals(json1, json2)─┐
│ {"a":42,"b":"42","c":"Hello"} │ {"a":42,"b":"42","c":"Hello"} │ 1 │
│ {"a":42,"b":"42","c":"Hello"} │ {"a":43,"b":"42","c":"Hello"} │ 0 │
│ {"a":42,"b":"42","c":"Hello"} │ {"a":43,"b":"42","c":"Hello"} │ 0 │
│ {"a":42,"b":"42","c":"Hello"} │ {"a":42,"b":"42","c":"World"} │ 0 │
│ {"a":42,"b":["1","2","3"],"c":"Hello"} │ {"a":42,"b":"42","c":"Hello"} │ 0 │
│ {"a":42,"b":42,"c":"Hello"} │ {"a":42,"b":"42","c":"Hello"} │ 0 │
│ {"a":42,"b":"42","c":"Hello"} │ {"a":42,"b":"42","c":"Hello"} │ 0 │
└────────────────────────────────────────┴───────────────────────────────┴──────────────────────┘
```
## Tips for better usage of the JSON type
Before creating `JSON` column and loading data into it, consider the following tips:

View File

@ -1286,6 +1286,7 @@ Setting fields:
- `table` Name of the table and schema if exists.
- `connection_string` Connection string.
- `invalidate_query` Query for checking the dictionary status. Optional parameter. Read more in the section [Refreshing dictionary data using LIFETIME](#refreshing-dictionary-data-using-lifetime).
- `background_reconnect` Reconnect to replica in background if connection fails. Optional parameter.
- `query` The custom query. Optional parameter.
:::note
@ -1877,6 +1878,7 @@ Setting fields:
- `table` Name of the table.
- `where` The selection criteria. The syntax for conditions is the same as for `WHERE` clause in PostgreSQL. For example, `id > 10 AND id < 20`. Optional parameter.
- `invalidate_query` Query for checking the dictionary status. Optional parameter. Read more in the section [Refreshing dictionary data using LIFETIME](#refreshing-dictionary-data-using-lifetime).
- `background_reconnect` Reconnect to replica in background if connection fails. Optional parameter.
- `query` The custom query. Optional parameter.
:::note

View File

@ -770,7 +770,8 @@ i
## indexOf(arr, x)
Returns the index of the first x element (starting from 1) if it is in the array, or 0 if it is not.
Returns the index of the first element with value x (starting from 1) if it is in the array.
If the array does not contain the searched-for value, the function returns 0.
Example:
@ -788,9 +789,11 @@ Elements set to `NULL` are handled as normal values.
## indexOfAssumeSorted(arr, x)
Returns the index of the first x element (starting from 1) if it is in the array, or 0 if it is not.
The function should be used for an array sorted not in descending order since binary search is used for the search.
If the internal array type is Nullable, the indexOf function will be used.
Returns the index of the first element with value x (starting from 1) if it is in the array.
If the array does not contain the searched-for value, the function returns 0.
Assumes that the array is sorted in ascending order (i.e., the function uses binary search).
If the array is not sorted, results are undefined.
If the internal array is of type Nullable, function indexOf will be called.
Example:
@ -799,9 +802,9 @@ SELECT indexOfAssumeSorted([1, 3, 3, 3, 4, 4, 5], 4)
```
``` text
┌─indexOf([1, 3, 3, 3, 4, 4, 5], NULL)─┐
5
└──────────────────────────────────--─-
┌─indexOfAssumeSorted([1, 3, 3, 3, 4, 4, 5], 4)─┐
5
└───────────────────────────────────────────────
```
## arrayCount(\[func,\] arr1, ...)
@ -2139,16 +2142,19 @@ Result:
```
## arrayAUC
## arrayROCAUC
Calculate AUC (Area Under the Curve, which is a concept in machine learning, see more details: <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>).
Calculates the Area Under the Curve (AUC), which is a concept in machine learning.
For more details, please see [here](https://developers.google.com/machine-learning/glossary#pr-auc-area-under-the-pr-curve), [here](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc#expandable-1) and [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).
**Syntax**
``` sql
arrayAUC(arr_scores, arr_labels[, scale])
arrayROCAUC(arr_scores, arr_labels[, scale])
```
Alias: `arrayAUC`
**Arguments**
- `arr_scores` — scores prediction model gives.
@ -2164,27 +2170,33 @@ Returns AUC value with type Float64.
Query:
``` sql
select arrayAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1]);
select arrayROCAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1]);
```
Result:
``` text
┌─arrayAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1])─┐
│ 0.75 │
└───────────────────────────────────────────────┘
┌─arrayROCAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1])─┐
0.75 │
└──────────────────────────────────────────────────
```
## arrayPrAUC
## arrayAUCPR
Calculate AUC (Area Under the Curve) for the Precision Recall curve.
Calculate the area under the precision-recall (PR) curve.
A precision-recall curve is created by plotting precision on the y-axis and recall on the x-axis across all thresholds.
The resulting value ranges from 0 to 1, with a higher value indicating better model performance.
PR AUC is particularly useful for imbalanced datasets, providing a clearer comparison of performance compared to ROC AUC on those cases.
For more details, please see [here](https://developers.google.com/machine-learning/glossary#pr-auc-area-under-the-pr-curve), [here](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc#expandable-1) and [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).
**Syntax**
``` sql
arrayPrAUC(arr_scores, arr_labels)
arrayAUCPR(arr_scores, arr_labels)
```
Alias: `arrayPRAUC`
**Arguments**
- `arr_scores` — scores prediction model gives.
@ -2199,13 +2211,13 @@ Returns PR-AUC value with type Float64.
Query:
``` sql
select arrayPrAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1]);
select arrayAUCPR([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1]);
```
Result:
``` text
┌─arrayPrAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1])─┐
┌─arrayAUCPR([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1])─┐
│ 0.8333333333333333 │
└─────────────────────────────────────────────────┘
```

View File

@ -7189,45 +7189,6 @@ Result:
└───────────────────────┘
```
## toUnixTimestamp64Milli
Converts a `DateTime64` to a `Int64` value with fixed millisecond precision. The input value is scaled up or down appropriately depending on its precision.
:::note
The output value is a timestamp in UTC, not in the timezone of `DateTime64`.
:::
**Syntax**
```sql
toUnixTimestamp64Milli(value)
```
**Arguments**
- `value` — DateTime64 value with any precision. [DateTime64](../data-types/datetime64.md).
**Returned value**
- `value` converted to the `Int64` data type. [Int64](../data-types/int-uint.md).
**Example**
Query:
```sql
WITH toDateTime64('2009-02-13 23:31:31.011', 3, 'UTC') AS dt64
SELECT toUnixTimestamp64Milli(dt64);
```
Result:
```response
┌─toUnixTimestamp64Milli(dt64)─┐
│ 1234567891011 │
└──────────────────────────────┘
```
## toUnixTimestamp64Second
Converts a `DateTime64` to a `Int64` value with fixed second precision. The input value is scaled up or down appropriately depending on its precision.
@ -7267,6 +7228,45 @@ Result:
└───────────────────────────────┘
```
## toUnixTimestamp64Milli
Converts a `DateTime64` to a `Int64` value with fixed millisecond precision. The input value is scaled up or down appropriately depending on its precision.
:::note
The output value is a timestamp in UTC, not in the timezone of `DateTime64`.
:::
**Syntax**
```sql
toUnixTimestamp64Milli(value)
```
**Arguments**
- `value` — DateTime64 value with any precision. [DateTime64](../data-types/datetime64.md).
**Returned value**
- `value` converted to the `Int64` data type. [Int64](../data-types/int-uint.md).
**Example**
Query:
```sql
WITH toDateTime64('2009-02-13 23:31:31.011', 3, 'UTC') AS dt64
SELECT toUnixTimestamp64Milli(dt64);
```
Result:
```response
┌─toUnixTimestamp64Milli(dt64)─┐
│ 1234567891011 │
└──────────────────────────────┘
```
## toUnixTimestamp64Micro
Converts a `DateTime64` to a `Int64` value with fixed microsecond precision. The input value is scaled up or down appropriately depending on its precision.
@ -7345,6 +7345,48 @@ Result:
└─────────────────────────────┘
```
## fromUnixTimestamp64Second
Converts an `Int64` to a `DateTime64` value with fixed second precision and optional timezone. The input value is scaled up or down appropriately depending on its precision.
:::note
Please note that input value is treated as a UTC timestamp, not timestamp at the given (or implicit) timezone.
:::
**Syntax**
``` sql
fromUnixTimestamp64Second(value[, timezone])
```
**Arguments**
- `value` — value with any precision. [Int64](../data-types/int-uint.md).
- `timezone` — (optional) timezone name of the result. [String](../data-types/string.md).
**Returned value**
- `value` converted to DateTime64 with precision `0`. [DateTime64](../data-types/datetime64.md).
**Example**
Query:
``` sql
WITH CAST(1733935988, 'Int64') AS i64
SELECT
fromUnixTimestamp64Second(i64, 'UTC') AS x,
toTypeName(x);
```
Result:
```response
┌───────────────────x─┬─toTypeName(x)────────┐
│ 2024-12-11 16:53:08 │ DateTime64(0, 'UTC') │
└─────────────────────┴──────────────────────┘
```
## fromUnixTimestamp64Milli
Converts an `Int64` to a `DateTime64` value with fixed millisecond precision and optional timezone. The input value is scaled up or down appropriately depending on its precision.
@ -7373,7 +7415,7 @@ fromUnixTimestamp64Milli(value[, timezone])
Query:
``` sql
WITH CAST(1234567891011, 'Int64') AS i64
WITH CAST(1733935988123, 'Int64') AS i64
SELECT
fromUnixTimestamp64Milli(i64, 'UTC') AS x,
toTypeName(x);
@ -7383,7 +7425,7 @@ Result:
```response
┌───────────────────────x─┬─toTypeName(x)────────┐
│ 2009-02-13 23:31:31.011 │ DateTime64(3, 'UTC') │
│ 2024-12-11 16:53:08.123 │ DateTime64(3, 'UTC') │
└─────────────────────────┴──────────────────────┘
```
@ -7415,7 +7457,7 @@ fromUnixTimestamp64Micro(value[, timezone])
Query:
``` sql
WITH CAST(1234567891011, 'Int64') AS i64
WITH CAST(1733935988123456, 'Int64') AS i64
SELECT
fromUnixTimestamp64Micro(i64, 'UTC') AS x,
toTypeName(x);
@ -7425,7 +7467,7 @@ Result:
```response
┌──────────────────────────x─┬─toTypeName(x)────────┐
1970-01-15 06:56:07.891011 │ DateTime64(6, 'UTC') │
2024-12-11 16:53:08.123456 │ DateTime64(6, 'UTC') │
└────────────────────────────┴──────────────────────┘
```
@ -7457,7 +7499,7 @@ fromUnixTimestamp64Nano(value[, timezone])
Query:
``` sql
WITH CAST(1234567891011, 'Int64') AS i64
WITH CAST(1733935988123456789, 'Int64') AS i64
SELECT
fromUnixTimestamp64Nano(i64, 'UTC') AS x,
toTypeName(x);
@ -7467,7 +7509,7 @@ Result:
```response
┌─────────────────────────────x─┬─toTypeName(x)────────┐
1970-01-01 00:20:34.567891011 │ DateTime64(9, 'UTC') │
2024-12-11 16:53:08.123456789 │ DateTime64(9, 'UTC') │
└───────────────────────────────┴──────────────────────┘
```

View File

@ -70,6 +70,10 @@ Received exception from server (version 21.4.1):
Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table default.test does not exist.
```
:::note
In ClickHouse Cloud users should use the `PERMANENTLY` clause e.g. `DETACH TABLE <table> PERMANENTLY`. If this clause is not used, tables will be reattached on cluster restart e.g. during upgrades.
:::
**See Also**
- [Materialized View](../../sql-reference/statements/create/view.md#materialized)

View File

@ -32,11 +32,13 @@ A table with the specified structure for reading or writing data in the specifie
**Examples**
Similar to the [AzureBlobStorage](/docs/en/engines/table-engines/integrations/azureBlobStorage) table engine, users can use Azurite emulator for local Azure Storage development. Further details [here](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=docker-hub%2Cblob-storage). Below we assume Azurite is available at the hostname `azurite1`.
Write data into azure blob storage using the following :
```sql
INSERT INTO TABLE FUNCTION azureBlobStorage('http://azurite1:10000/devstoreaccount1',
'test_container', 'test_{_partition_id}.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
'testcontainer', 'test_{_partition_id}.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
'CSV', 'auto', 'column1 UInt32, column2 UInt32, column3 UInt32') PARTITION BY column3 VALUES (1, 2, 3), (3, 2, 1), (78, 43, 3);
```
@ -44,7 +46,7 @@ And then it can be read using
```sql
SELECT * FROM azureBlobStorage('http://azurite1:10000/devstoreaccount1',
'test_container', 'test_1.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
'testcontainer', 'test_1.csv', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==',
'CSV', 'auto', 'column1 UInt32, column2 UInt32, column3 UInt32');
```
@ -58,7 +60,7 @@ or using connection_string
```sql
SELECT count(*) FROM azureBlobStorage('DefaultEndpointsProtocol=https;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;EndPointSuffix=core.windows.net',
'test_container', 'test_3.csv', 'CSV', 'auto' , 'column1 UInt32, column2 UInt32, column3 UInt32');
'testcontainer', 'test_3.csv', 'CSV', 'auto' , 'column1 UInt32, column2 UInt32, column3 UInt32');
```
``` text

View File

@ -32,11 +32,13 @@ A table with the specified structure for reading or writing data in the specifie
**Examples**
Similar to the [AzureBlobStorage](/docs/en/engines/table-engines/integrations/azureBlobStorage) table engine, users can use Azurite emulator for local Azure Storage development. Further details [here](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=docker-hub%2Cblob-storage). Below we assume Azurite is available at the hostname `azurite1`.
Select the count for the file `test_cluster_*.csv`, using all the nodes in the `cluster_simple` cluster:
``` sql
SELECT count(*) from azureBlobStorageCluster(
'cluster_simple', 'http://azurite1:10000/devstoreaccount1', 'test_container', 'test_cluster_count.csv', 'devstoreaccount1',
'cluster_simple', 'http://azurite1:10000/devstoreaccount1', 'testcontainer', 'test_cluster_count.csv', 'devstoreaccount1',
'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'CSV',
'auto', 'key UInt64')
```

View File

@ -11,7 +11,7 @@ Allows processing files from [Amazon S3](https://aws.amazon.com/s3/) and Google
**Syntax**
``` sql
s3Cluster(cluster_name, url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,structure] [,compression_method])
s3Cluster(cluster_name, url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,structure] [,compression_method],[,headers])
s3Cluster(cluster_name, named_collection[, option=value [,..]])
```
@ -25,6 +25,7 @@ s3Cluster(cluster_name, named_collection[, option=value [,..]])
- `format` — The [format](../../interfaces/formats.md#formats) of the file.
- `structure` — Structure of the table. Format `'column1_name column1_type, column2_name column2_type, ...'`.
- `compression_method` — Parameter is optional. Supported values: `none`, `gzip` or `gz`, `brotli` or `br`, `xz` or `LZMA`, `zstd` or `zst`. By default, it will autodetect compression method by file extension.
- `headers` - Parameter is optional. Allows headers to be passed in the S3 request. Pass in the format `headers(key=value)` e.g. `headers('x-amz-request-payer' = 'requester')`. See [here](/docs/en/sql-reference/table-functions/s3#accessing-requester-pays-buckets) for example of use.
Arguments can also be passed using [named collections](/docs/en/operations/named-collections.md). In this case `url`, `access_key_id`, `secret_access_key`, `format`, `structure`, `compression_method` work in the same way, and some extra parameters are supported:

View File

@ -1,8 +0,0 @@
---
sidebar_label: 招待
title: 招待
---
## すべての招待を一覧表示
このファイルは、ビルドプロセス中に `clickhouseapi.js` によって生成されます。内容を変更する必要がある場合は、`clickhouseapi.js` を編集してください。

View File

@ -1,9 +0,0 @@
---
sidebar_label: キー
title: キー
---
## すべてのキーのリストを取得する
このファイルは、ビルドプロセス中に `clickhouseapi.js` によって生成されます。
内容を変更する必要がある場合は、`clickhouseapi.js` を編集してください。

View File

@ -1,8 +0,0 @@
---
sidebar_label: メンバー
title: メンバー
---
## 組織メンバーの一覧
このファイルはビルドプロセス中に`clickhouseapi.js`によって生成されます。内容を変更する必要がある場合は、`clickhouseapi.js`を編集してください。

View File

@ -1,8 +0,0 @@
---
sidebar_label: 組織
title: 組織
---
## 組織の詳細を取得する
このファイルはビルドプロセス中に `clickhouseapi.js` によって生成されます。内容を変更する必要がある場合は、`clickhouseapi.js` を編集してください。

View File

@ -1,8 +0,0 @@
---
sidebar_label: サービス
title: サービス
---
## 組織サービスの一覧
このファイルは、ビルドプロセス中に `clickhouseapi.js` によって生成されます。内容を変更する必要がある場合は、`clickhouseapi.js` を編集してください。

View File

@ -1,8 +0,0 @@
---
slug: /ja/whats-new/changelog/
sidebar_position: 2
sidebar_label: 2024
title: 2024 Changelog
note: このファイルは `yarn new-build` によって自動生成されます。
---

View File

@ -1,41 +0,0 @@
<details><summary>GCS バケットと HMAC キーを作成する</summary>
### ch_bucket_us_east1
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-bucket-1.png)
### ch_bucket_us_east4
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-bucket-2.png)
### アクセスキーを生成する
### サービスアカウントの HMAC キーとシークレットを作成する
**Cloud Storage > Settings > Interoperability** を開き、既存の **Access key** を選択するか、**CREATE A KEY FOR A SERVICE ACCOUNT** を選択します。このガイドでは、新しいサービスアカウントの新しいキーを作成する手順を説明します。
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-create-a-service-account-key.png)
### 新しいサービスアカウントを追加する
すでにサービスアカウントが存在しないプロジェクトの場合は、**CREATE NEW ACCOUNT** をクリックします。
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-create-service-account-0.png)
サービスアカウントを作成するには3つのステップがあります。最初のステップでは、アカウントに意味のある名前、ID、説明を付けます。
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-create-service-account-a.png)
Interoperability 設定ダイアログでは、IAM ロールとして **Storage Object Admin** ロールが推奨されます。ステップ2でそのロールを選択します。
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-create-service-account-2.png)
ステップ3はオプションであり、このガイドでは使用しません。ポリシーに基づいて、ユーザーにこれらの特権を与えることができます。
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-create-service-account-3.png)
サービスアカウントの HMAC キーが表示されます。この情報を保存してください。ClickHouse の設定で使用します。
![バケットを追加](@site/docs/ja/integrations/data-ingestion/s3/images/GCS-guide-key.png)
</details>

View File

@ -1,132 +0,0 @@
<details><summary>S3バケットとIAMユーザーの作成</summary>
この記事では、AWS IAMユーザーを設定し、S3バケットを作成し、ClickHouseをそのバケットをS3ディスクとして使用するように設定する基本を説明しています。使用する権限を決定するためにセキュリティチームと協力し、これらを出発点として考えてください。
### AWS IAMユーザーの作成
この手順では、ログインユーザーではなくサービスアカウントユーザーを作成します。
1. AWS IAM 管理コンソールにログインします。
2. 「ユーザー」で、**ユーザーを追加** を選択します。
![create_iam_user_0](@site/docs/ja/_snippets/images/s3/s3-1.png)
3. ユーザー名を入力し、資格情報の種類を **アクセスキー - プログラムによるアクセス** に設定し、**次: 権限** を選択します。
![create_iam_user_1](@site/docs/ja/_snippets/images/s3/s3-2.png)
4. ユーザーをグループに追加せず、**次: タグ** を選択します。
![create_iam_user_2](@site/docs/ja/_snippets/images/s3/s3-3.png)
5. タグを追加する必要がなければ、**次: 確認** を選択します。
![create_iam_user_3](@site/docs/ja/_snippets/images/s3/s3-4.png)
6. **ユーザーを作成** を選択します。
:::note
ユーザーに権限がないという警告メッセージは無視できます。次のセクションでバケットに対してユーザーに権限が付与されます。
:::
![create_iam_user_4](@site/docs/ja/_snippets/images/s3/s3-5.png)
7. ユーザーが作成されました。**表示** をクリックし、アクセスキーとシークレットキーをコピーします。
:::note
これがシークレットアクセスキーが利用可能な唯一のタイミングですので、キーを別の場所に保存してください。
:::
![create_iam_user_5](@site/docs/ja/_snippets/images/s3/s3-6.png)
8. 閉じるをクリックし、ユーザー画面でそのユーザーを見つけます。
![create_iam_user_6](@site/docs/ja/_snippets/images/s3/s3-7.png)
9. ARNAmazon Resource Nameをコピーし、バケットのアクセスポリシーを設定する際に使用するために保存します。
![create_iam_user_7](@site/docs/ja/_snippets/images/s3/s3-8.png)
### S3バケットの作成
1. S3バケットセクションで、**バケットの作成** を選択します。
![create_s3_bucket_0](@site/docs/ja/_snippets/images/s3/s3-9.png)
2. バケット名を入力し、他のオプションはデフォルトのままにします。
:::note
バケット名はAWS全体で一意である必要があります。組織内だけでなく、一意でない場合はエラーが発生します。
:::
3. `すべてのパブリックアクセスをブロック` を有効のままにします。パブリックアクセスは必要ありません。
![create_s3_bucket_2](@site/docs/ja/_snippets/images/s3/s3-a.png)
4. ページの下部にある **バケットの作成** を選択します。
![create_s3_bucket_3](@site/docs/ja/_snippets/images/s3/s3-b.png)
5. リンクを選択し、ARNをコピーして、バケットのアクセスポリシーを設定するときに使用するために保存します。
6. バケットが作成されたら、S3バケットリストで新しいS3バケットを見つけ、リンクを選択します。
![create_s3_bucket_4](@site/docs/ja/_snippets/images/s3/s3-c.png)
7. **フォルダを作成** を選択します。
![create_s3_bucket_5](@site/docs/ja/_snippets/images/s3/s3-d.png)
8. ClickHouse S3ディスクのターゲットとなるフォルダ名を入力し、**フォルダを作成** を選択します。
![create_s3_bucket_6](@site/docs/ja/_snippets/images/s3/s3-e.png)
9. フォルダがバケットリストに表示されるはずです。
![create_s3_bucket_7](@site/docs/ja/_snippets/images/s3/s3-f.png)
10. 新しいフォルダのチェックボックスを選択し、**URLをコピー** をクリックします。コピーしたURLは、次のセクションでのClickHouseストレージ設定で使用します。
![create_s3_bucket_8](@site/docs/ja/_snippets/images/s3/s3-g.png)
11. **権限** タブを選択し、**バケットポリシー** セクションの **編集** ボタンをクリックします。
![create_s3_bucket_9](@site/docs/ja/_snippets/images/s3/s3-h.png)
12. 以下の例のようにバケットポリシーを追加します:
```json
{
"Version": "2012-10-17",
"Id": "Policy123456",
"Statement": [
{
"Sid": "abc123",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::921234567898:user/mars-s3-user"
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::mars-doc-test",
"arn:aws:s3:::mars-doc-test/*"
]
}
]
}
```
```response
|パラメータ | 説明 | 例 |
|----------|-------------|----------------|
|Version | ポリシーインタープリタのバージョン、そのままにしておく | 2012-10-17 |
|Sid | ユーザー定義のポリシーID | abc123 |
|Effect | ユーザー要求が許可されるか拒否されるか | Allow |
|Principal | 許可されるアカウントまたはユーザー | arn:aws:iam::921234567898:user/mars-s3-user |
|Action | バケット上で許可される操作| s3:*|
|Resource | バケット内で操作が許可されるリソース | "arn:aws:s3:::mars-doc-test", "arn:aws:s3:::mars-doc-test/*" |
```
:::note
使用する権限を決定するためにセキュリティチームと協力し、これらを出発点として考えてください。
ポリシーと設定の詳細については、AWSドキュメントをご参照ください
https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html
:::
13. ポリシー設定を保存します。
</details>

View File

@ -1,11 +0,0 @@
<details><summary>IPアクセスリストを管理する</summary>
ClickHouse Cloudのサービスリストから作業するサービスを選択し、**セキュリティ**に切り替えます。IPアクセスリストに、ClickHouse Cloudサービスに接続する必要があるリモートシステムのIPアドレスや範囲が含まれていない場合は、**エントリを追加**して問題を解決できます。
![サービスがトラフィックを許可しているか確認](@site/docs/ja/_snippets/images/ip-allow-list-check-list.png)
ClickHouse Cloudサービスに接続する必要がある個別のIPアドレス、またはアドレスの範囲を追加します。フォームを適宜修正し、**エントリを追加**し、**エントリを送信**します。
![現在のIPアドレスを追加](@site/docs/ja/_snippets/images/ip-allow-list-add-current-ip.png)
</details>

View File

@ -1,45 +0,0 @@
<details><summary>DockerでApache Supersetを起動</summary>
Supersetは、[Docker Composeを使用してローカルにSupersetをインストールする](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/)手順を提供しています。GitHubからApache Supersetリポジトリをチェックアウトした後、最新の開発コードや特定のタグを実行することができます。`pre-release`としてマークされていない最新のリリースである2.0.0をお勧めします。
`docker compose`を実行する前にいくつかのタスクを行う必要があります:
1. 公式のClickHouse Connectドライバーを追加
2. MapBox APIキーを取得し、それを環境変数として追加任意
3. 実行するSupersetのバージョンを指定
:::tip
以下のコマンドはGitHubリポジトリのトップレベル、`superset`から実行してください。
:::
## 公式ClickHouse Connectドライバー
SupersetデプロイメントでClickHouse Connectドライバーを利用可能にするために、ローカルのrequirementsファイルに追加します
```bash
echo "clickhouse-connect" >> ./docker/requirements-local.txt
```
## MapBox
これは任意です。MapBox APIキーなしでSupersetで位置データをプロットできますが、キーを追加するべきというメッセージが表示され、地図の背景画像が欠けますデータポイントのみが表示され、地図の背景は表示されません。MapBoxは無料のティアを提供していますので、利用したい場合はぜひご利用ください。
ガイドが作成するサンプルの可視化の一部は、例えば経度や緯度データなどの位置情報を使用します。SupersetはMapBoxマップのサポートを含んでいます。MapBoxの可視化を使用するには、MapBox APIキーが必要です。[MapBoxの無料ティア](https://account.mapbox.com/auth/signup/)にサインアップし、APIキーを生成してください。
APIキーをSupersetで利用可能にします
```bash
echo "MAPBOX_API_KEY=pk.SAMPLE-Use-your-key-instead" >> docker/.env-non-dev
```
## Supersetバージョン2.0.0をデプロイ
リリース2.0.0をデプロイするには、以下を実行します:
```bash
git checkout 2.0.0
TAG=2.0.0 docker-compose -f docker-compose-non-dev.yml pull
TAG=2.0.0 docker-compose -f docker-compose-non-dev.yml up
```
</details>

View File

@ -1,11 +0,0 @@
| 地域 | VPC サービス名 | アベイラビリティーゾーン ID |
|------------------|--------------------------------------------------------------------|------------------------------|
|ap-south-1 | com.amazonaws.vpce.ap-south-1.vpce-svc-0a786406c7ddc3a1b | aps1-az1 aps1-az2 aps1-az3 |
|ap-southeast-1 | com.amazonaws.vpce.ap-southeast-1.vpce-svc-0a8b096ec9d2acb01 | apse1-az1 apse1-az2 apse1-az3|
|ap-southeast-2 | com.amazonaws.vpce.ap-southeast-2.vpce-svc-0ca446409b23f0c01 | apse2-az1 apse2-az2 apse2-az3|
|eu-central-1 | com.amazonaws.vpce.eu-central-1.vpce-svc-0536fc4b80a82b8ed | euc1-az2 euc1-az3 euc1-az1 |
|eu-west-1 | com.amazonaws.vpce.eu-west-1.vpce-svc-066b03c9b5f61c6fc | euw1-az2 euw1-az3 euw1-az1 |
|us-east-1 c0 | com.amazonaws.vpce.us-east-1.vpce-svc-0a0218fa75c646d81 | use1-az6 use1-az1 use1-az2 |
|us-east-1 c1 | com.amazonaws.vpce.us-east-1.vpce-svc-096c118db1ff20ea4 | use1-az6 use1-az4 use1-az2 |
|us-east-2 | com.amazonaws.vpce.us-east-2.vpce-svc-0b99748bf269a86b4 | use2-az1 use2-az2 use2-az3 |
|us-west-2 | com.amazonaws.vpce.us-west-2.vpce-svc-049bbd33f61271781 | usw2-az2 usw2-az1 usw2-az3 |

View File

@ -1,15 +0,0 @@
<details><summary>IPアクセスリストを管理する</summary>
ClickHouse Cloudのサービスリストから作業するサービスを選び、**設定**に切り替えます。
![サービスの設定](@site/docs/ja/_snippets/images/cloud-service-settings.png)
IPアクセスリストが**現在、このサービスにアクセスできるトラフィックはありません**と表示される場合は、**エントリを追加**して問題を解決できます。
![サービスがトラフィックを許可しているか確認する](@site/docs/ja/_snippets/images/ip-allow-list-check-list.png)
クイックスタートのために、ローカルのセキュリティポリシーが許可する場合は、現在のIPアドレスのみを追加することができます。これを行うには、**現在のIPを追加**を使用し、現在のIPと説明「ホームIP」でフォームを自動入力します。必要に応じてフォームを修正し、**エントリを追加**し**エントリを送信**します。
![現在のIPアドレスを追加する](@site/docs/ja/_snippets/images/ip-allow-list-add-current-ip.png)
</details>

View File

@ -1,61 +0,0 @@
1. ClickHouse Cloud Serviceを作成した後、認証情報画面でMySQLタブを選択します。
![Credentials screen - Prompt](./images/mysql1.png)
2. この特定のサービスに対してMySQLインターフェースを有効にするためにスイッチを切り替えます。これにより、そのサービスでポート`3306`が公開され、ユニークなMySQLユーザー名を含むMySQL接続画面が表示されます。パスワードはサービスのデフォルトユーザーのパスワードと同じになります。
![Credentials screen - Enabled MySQL](./images/mysql2.png)
代わりに、既存のサービスに対してMySQLインターフェースを有効にするには:
3. サービスが`Running`状態であることを確認し、MySQLインターフェースを有効にするサービスの「接続文字列を表示」ボタンをクリックします。
![Connection screen - Prompt MySQL](./images/mysql3.png)
4. この特定のサービスに対してMySQLインターフェースを有効にするためにスイッチを切り替えます。これにより、デフォルトのパスワードを入力するよう求められます。
![Connection screen - Prompt MySQL](./images/mysql4.png)
5. パスワードを入力すると、このサービスのMySQL接続文字列が表示されます。
![Connection screen - MySQL Enabled](./images/mysql5.png)
## ClickHouse Cloudで複数のMySQLユーザーを作成する
デフォルトでは、`mysql4<subdomain>`という組み込みユーザーがあり、これは`default`ユーザーと同じパスワードを使用します。`<subdomain>`部分はあなたのClickHouse Cloudホスト名の最初のセグメントです。このフォーマットは、安全な接続を実装しているが[TLSハンドシェイクでSNI情報を提供しない](https://www.cloudflare.com/learning/ssl/what-is-sni)ツールMySQLコンソールクライアントがその一例で作業するために必要です。この場合、ユーザー名に追加のヒントを含めずには内部ルーティングを行うことができません。
これにより、MySQLインターフェースで使用する新しいユーザーを作成する際には、`mysql4<subdomain>_<username>`のフォーマットを使用することを_強くお勧めします_。ここで、`<subdomain>`はあなたのCloudサービスを識別するためのヒントであり、`<username>`は選択した任意のサフィックスです。
:::tip
ClickHouse Cloudホスト名が`foobar.us-east1.aws.clickhouse.cloud`の場合、`<subdomain>`部分は`foobar`に相当し、カスタムMySQLユーザー名は`mysql4foobar_team1`のようになります。
:::
MySQLインターフェースを使用するために追加のユーザーを作成することができます。例えば、追加の設定を適用する必要がある場合などです。
1. オプション - カスタムユーザーに適用する[設定プロフィール](https://clickhouse.com/docs/ja/sql-reference/statements/create/settings-profile)を作成します。たとえば、後で作成するユーザーで接続するときにデフォルトで適用される追加設定を持つ`my_custom_profile`:
```sql
CREATE SETTINGS PROFILE my_custom_profile SETTINGS prefer_column_name_to_alias=1;
```
`prefer_column_name_to_alias`は単なる例として使用されます。ここに他の設定を使用することもできます。
2. 以下のフォーマットを使用して[ユーザーを作成](https://clickhouse.com/docs/ja/sql-reference/statements/create/user)します: `mysql4<subdomain>_<username>` ([上記参照](#creating-multiple-mysql-users-in-clickhouse-cloud))。パスワードはダブルSHA1形式である必要があります。例:
```sql
CREATE USER mysql4foobar_team1 IDENTIFIED WITH double_sha1_password BY 'YourPassword42$';
```
または、このユーザーにカスタムプロフィールを使用したい場合:
```sql
CREATE USER mysql4foobar_team1 IDENTIFIED WITH double_sha1_password BY 'YourPassword42$' SETTINGS PROFILE 'my_custom_profile';
```
ここで、`my_custom_profile`は前に作成したプロフィールの名前です。
3. 新しいユーザーに必要なアクセス権を付与して、目的のテーブルまたはデータベースと対話できるようにします。[権限を付与](https://clickhouse.com/docs/ja/sql-reference/statements/grant)する例として、たとえば`system.query_log`のみのアクセスを付与したい場合:
```sql
GRANT SELECT ON system.query_log TO mysql4foobar_team1;
```
4. 作成したユーザーを使用して、MySQLインターフェースでClickHouse Cloudサービスに接続します。
### ClickHouse Cloudでの複数のMySQLユーザーのトラブルシューティング
新しいMySQLユーザーを作成し、MySQL CLIクライアントで接続しているときに以下のエラーが表示された場合
```
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 54
```
この場合、ユーザー名が`mysql4<subdomain>_<username>`形式に従っていることを確認してください。[上記](#creating-multiple-mysql-users-in-clickhouse-cloud)で説明されています。

View File

@ -1,87 +0,0 @@
ClickHouseサーバーにMySQLインターフェースを有効にする方法については[公式ドキュメント](https://clickhouse.com/docs/ja/interfaces/mysql)を参照してください。
サーバーの `config.xml` にエントリを追加することに加えて、
```xml
<clickhouse>
<mysql_port>9004</mysql_port>
</clickhouse>
```
MySQLインターフェースを利用するユーザーには、[二重SHA1パスワード暗号化](https://clickhouse.com/docs/ja/operations/settings/settings-users#user-namepassword)を使用することが**必要**です。
シェルから二重SHA1で暗号化されたランダムパスワードを生成するには以下を実行してください
```shell
PASSWORD=$(base64 < /dev/urandom | head -c16); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
```
出力は以下のようになります:
```
LZOQYnqQN4L/T6L0
fbc958cc745a82188a51f30de69eebfc67c40ee4
```
最初の行は生成されたパスワードで、2行目はClickHouseの設定に使用するハッシュです。
以下は生成されたハッシュを使用する`mysql_user`の設定例です:
`/etc/clickhouse-server/users.d/mysql_user.xml`
```xml
<users>
<mysql_user>
<password_double_sha1_hex>fbc958cc745a82188a51f30de69eebfc67c40ee4</password_double_sha1_hex>
<networks>
<ip>::/0</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
</mysql_user>
</users>
```
`password_double_sha1_hex` エントリを自分で生成した二重SHA1ハッシュに置き換えてください。
さらに、BIツールがMySQLコネクタを使用する際にデータベーススキーマを適切に調査できるように、`SHOW [FULL] COLUMNS` クエリの結果でMySQLネイティブタイプを表示するために、`use_mysql_types_in_show_columns`を使用することを推奨します。
例えば:
`/etc/clickhouse-server/users.d/mysql_user.xml`
```xml
<profiles>
<default>
<use_mysql_types_in_show_columns>1</use_mysql_types_in_show_columns>
</default>
</profiles>
```
または、デフォルト以外の異なるプロファイルに割り当てることもできます。
`mysql` バイナリが利用可能であれば、コマンドラインから接続をテストできます。以下は、サンプルのユーザー名 (`mysql_user`) とパスワード (`LZOQYnqQN4L/T6L0`) を使用したコマンドです:
```bash
mysql --protocol tcp -h localhost -u mysql_user -P 9004 --password=LZOQYnqQN4L/T6L0
```
```
mysql> show databases;
+--------------------+
| name |
+--------------------+
| INFORMATION_SCHEMA |
| default |
| information_schema |
| system |
+--------------------+
4行取得しました (0.00 sec)
4行読み込み、603.00 B、0.00156秒で、2564行/秒、377.48 KiB/秒
```
最後に、ClickHouseサーバーを希望するIPアドレスでリッスンするように設定します。例えば、`config.xml` の中で、すべてのアドレスでリッスンするために以下をアンコメントしてください:
```bash
<listen_host>::</listen_host>
```

View File

@ -1,19 +0,0 @@
## クラウドのバックアップとリストア
各サービスは毎日バックアップされています。サービスの**バックアップ**タブで、サービスのバックアップリストを見ることができます。そこからバックアップをリストアしたり、バックアップを削除することができます。
![バックアップのリスト](@site/docs/ja/_snippets/images/cloud-backup-list.png)
**バックアップをリストア**アイコンをクリックすると、新しく作成されるサービスの**サービス名**を指定して、**このバックアップをリストア**できます。
![バックアップのリスト](@site/docs/ja/_snippets/images/cloud-backup-restore.png)
新しいサービスは、準備が整うまでサービスリストに**プロビジョニング**として表示されます。
![バックアップのリスト](@site/docs/ja/_snippets/images/cloud-backup-new-service.png)
新しいサービスのプロビジョニングが完了すると、接続できます。その後…
:::note
ClickHouse Cloud サービスを利用する際に、SQL クライアントで `BACKUP` および `RESTORE` コマンドを使用しないでください。クラウドのバックアップは UI から管理する必要があります。
:::

View File

@ -1,7 +0,0 @@
:::important best practices
ClickHouse Server を設定する際、設定ファイルを追加または編集するときは次のようにしてください:
- ファイルを `/etc/clickhouse-server/config.d/` ディレクトリに追加する
- ファイルを `/etc/clickhouse-server/users.d/` ディレクトリに追加する
- `/etc/clickhouse-server/config.xml` ファイルはそのままにしておく
- `/etc/clickhouse-server/users.xml` ファイルはそのままにしておく
:::

View File

@ -1,17 +0,0 @@
HTTP(S) を使用して ClickHouse に接続するには、以下の情報が必要です:
- **HOST と PORT**: 通常、TLS を使用する場合のポートは 8443、TLS を使用しない場合は 8123 です。
- **データベース名**: デフォルトで `default` という名前のデータベースがありますが、接続したいデータベースの名前を使用してください。
- **ユーザー名とパスワード**: デフォルトでユーザー名は `default` です。使用ケースに適したユーザー名を使用してください。
ClickHouse Cloud サービスの詳細は、ClickHouse Cloud コンソールで確認できます。 接続するサービスを選択し、**接続** をクリックします:
![ClickHouse Cloud service connect button](@site/docs/ja/_snippets/images/cloud-connect-button.png)
**HTTPS** を選択すると、サンプルの `curl` コマンドで詳細が確認できます。
![ClickHouse Cloud HTTPS connection details](@site/docs/ja/_snippets/images/connection-details-https.png)
セルフマネージドの ClickHouse を使用している場合、接続の詳細は ClickHouse 管理者によって設定されます。

View File

@ -1,17 +0,0 @@
ClickHouseにネイティブTCPで接続するには、次の情報が必要です。
- **HOSTとPORT**: 通常、TLSを使用している場合はポートは9440、TLSを使用していない場合は9000です。
- **データベース名**: デフォルトでは、`default`という名前のデータベースがあります。接続したいデータベースの名前を使用してください。
- **ユーザー名とパスワード**: デフォルトのユーザー名は`default`です。使用するケースに適したユーザー名を利用してください。
ClickHouse Cloudサービスの詳細は、ClickHouse Cloudコンソールで確認できます。接続するサービスを選択し、**Connect**をクリックします。
![ClickHouse Cloud service connect button](@site/docs/ja/_snippets/images/cloud-connect-button.png)
**Native** を選択すると、例として `clickhouse-client` コマンドで使用可能な詳細が表示されます。
![ClickHouse Cloud Native TCP connection details](@site/docs/ja/_snippets/images/connection-details-native.png)
セルフマネージドのClickHouseを使用している場合、接続の詳細はClickHouse管理者によって設定されます。

View File

@ -1,6 +0,0 @@
| リージョン | サービスアタッチメント | プライベートDNSドメイン |
|--------------|-------------------------------------------------------------|------------------------------|
|asia-southeast1| projects/dataplane-production/regions/asia-southeast1/serviceAttachments/production-asia-southeast1-clickhouse-cloud| asia-southeast1.p.gcp.clickhouse.cloud|
|europe-west4| projects/dataplane-production/regions/europe-west4/serviceAttachments/production-europe-west4-clickhouse-cloud| europe-west4.p.gcp.clickhouse.cloud|
|us-central1| projects/dataplane-production/regions/us-central1/serviceAttachments/production-us-central1-clickhouse-cloud| us-central1.p.gcp.clickhouse.cloud|
|us-east1| projects/dataplane-production/regions/us-east1/serviceAttachments/production-us-east1-clickhouse-cloud| us-east1.p.gcp.clickhouse.cloud|

View File

@ -1,5 +0,0 @@
:::important ベストプラクティス
ClickHouse Keeperを構成するために設定ファイルを編集する際には、以下を行うべきです:
- `/etc/clickhouse-keeper/keeper_config.xml` をバックアップする
- `/etc/clickhouse-keeper/keeper_config.xml` ファイルを編集する
:::

View File

@ -1,11 +0,0 @@
:::tip SQL コンソール
SQL クライアント接続が必要な場合、ClickHouse Cloud サービスには関連付けられたウェブベースの SQL コンソールがあります。詳細については、以下の **SQL コンソールに接続** を展開してください。
:::
<details><summary>SQL コンソールに接続</summary>
ClickHouse Cloud サービス一覧から、作業するサービスを選択し、**接続** をクリックします。ここから **SQL コンソールを開く** ことができます:
![SQL コンソールに接続](@site/docs/ja/_snippets/images/cloud-connect-to-sql-console.png)
</details>

View File

@ -1,9 +0,0 @@
## 用語集
### レプリカ
データのコピー。ClickHouseは常にデータの少なくとも1つのコピーを持っているため、**レプリカ**の最小数は1です。これは重要なポイントで、元のデータをレプリカとして数えることに慣れていないかもしれませんが、ClickHouseのコードとドキュメントではその用語が使用されています。データの2番目のレプリカを追加することで、フォールトトレランスを提供できます。
### シャード
データのサブセット。ClickHouseは常にデータの少なくとも1つのシャードを持っているので、データを複数のサーバーに分散しない場合、データは1つのシャードに格納されます。データを複数のサーバーに分散してシャーディングすることは、単一サーバーの容量を超えた場合に負荷を分散するために利用できます。宛先サーバーは**シャーディングキー**によって決まり、分散テーブルを作成する際に定義されます。シャーディングキーはランダムなものか、[ハッシュ関数](https://clickhouse.com/docs/ja/sql-reference/functions/hash-functions)の出力として定義することができます。シャーディングを含むデプロイメント例では、シャーディングキーとして`rand()`を使用し、いつどのようにして異なるシャーディングキーを選択するかについてのさらなる情報を提供します。
### 分散調整
ClickHouse Keeperは、データのレプリケーションと分散DDLクエリの実行のための調整システムを提供します。ClickHouse KeeperはApache ZooKeeperと互換性があります。

View File

@ -1,3 +0,0 @@
:::note
このページは [ClickHouse Cloud](https://clickhouse.com/cloud) には適用されません。ここで記載されている手順は、ClickHouse Cloud サービスで自動化されています。
:::

View File

@ -1,4 +0,0 @@
:::note
このページは[ClickHouse Cloud](https://clickhouse.com/cloud)には適用されません。ここで文書化されている機能は、ClickHouse Cloudサービスでは利用できません。
詳細は、ClickHouseの[Cloud Compatibility](/docs/ja/whats-new/cloud-compatibility)ガイドをご覧ください。
:::

View File

@ -1,3 +0,0 @@
:::note
このページは[ClickHouse Cloud](https://clickhouse.com/cloud)には適用されません。ここに記載されている手順は、セルフマネージドのClickHouseデプロイメントでのみ必要です。
:::

View File

@ -1,3 +0,0 @@
:::note
このページは[ClickHouse Cloud](https://clickhouse.com/cloud)には適用されません。ここで文書化されている機能は、ClickHouse Cloudサービスではまだ利用できません。詳しくは、ClickHouseの[Cloud互換性](/docs/ja/whats-new/cloud-compatibility#roadmap)ガイドを参照してください。
:::

View File

@ -1,3 +0,0 @@
<p>ClickHouse Cloudサービスの<b>アクションメニュー</b>を開き、<b>{props.menu}</b>を選択します:</p>
![Cloud service Actions menu](@site/docs/ja/_snippets/images/cloud-service-actions-menu.png)

View File

@ -1,3 +0,0 @@
[ClickHouse.cloud](https://clickhouse.cloud)でアカウントを作成するか、サインインしてください。
![Cloud sign in prompt](@site/docs/ja/_snippets/images/cloud-sign-in-or-trial.png)

View File

@ -1,22 +0,0 @@
---
sidebar_label: タブサンプル
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import CodeBlock from '@theme/CodeBlock';
## ステップ 1
<Tabs groupId="deployMethod">
<TabItem value="serverless" label="ClickHouse クラウド" default>
クラウド
</TabItem>
<TabItem value="selfmanaged" label="セルフマネージド">
セルフマネージド
</TabItem>
</Tabs>

View File

@ -1,447 +0,0 @@
## 管理者権限のテスト
ユーザー `default` からログアウトし、ユーザー `clickhouse_admin` としてログインし直してください。
これらすべてが成功するはずです:
```sql
SHOW GRANTS FOR clickhouse_admin;
```
```sql
CREATE DATABASE db1
```
```sql
CREATE TABLE db1.table1 (id UInt64, column1 String) ENGINE = MergeTree() ORDER BY id;
```
```sql
INSERT INTO db1.table1 (id, column1) VALUES (1, 'abc');
```
```sql
SELECT * FROM db1.table1;
```
```sql
DROP TABLE db1.table1;
```
```sql
DROP DATABASE db1;
```
## 非管理者ユーザー
ユーザーは必要な権限を持ち、全員が管理者であるべきではありません。このドキュメントの残りの部分では、例のシナリオと必要な役割を提供します。
### 準備
例で使用されるテーブルとユーザーを作成します。
#### サンプルデータベース、テーブル、および行の作成
1. テストデータベースを作成
```sql
CREATE DATABASE db1;
```
2. テーブルを作成
```sql
CREATE TABLE db1.table1 (
id UInt64,
column1 String,
column2 String
)
ENGINE MergeTree
ORDER BY id;
```
3. サンプル行でテーブルを埋める
```sql
INSERT INTO db1.table1
(id, column1, column2)
VALUES
(1, 'A', 'abc'),
(2, 'A', 'def'),
(3, 'B', 'abc'),
(4, 'B', 'def');
```
4. テーブルを確認する:
```sql
SELECT *
FROM db1.table1
```
```response
Query id: 475015cc-6f51-4b20-bda2-3c9c41404e49
┌─id─┬─column1─┬─column2─┐
│ 1 │ A │ abc │
│ 2 │ A │ def │
│ 3 │ B │ abc │
│ 4 │ B │ def │
└────┴─────────┴─────────┘
```
5. 特定のカラムへのアクセスを制限することを示すために使用される通常のユーザーを作成:
```sql
CREATE USER column_user IDENTIFIED BY 'password';
```
6. 特定の値を持つ行へのアクセスを制限することを示すために使用される通常のユーザーを作成:
```sql
CREATE USER row_user IDENTIFIED BY 'password';
```
#### 役割の作成
この例を使って:
- カラムや行、異なる権限のための役割を作成します
- 役割に権限を付与します
- ユーザーを各役割に割り当てます
役割は、各ユーザーを個別に管理する代わりに、特定の権限を持つユーザーのグループを定義するために使用されます。
1. `db1` データベースおよび `table1` において、`column1` のみを閲覧できるユーザーの役割を作成:
```sql
CREATE ROLE column1_users;
```
2. `column1` のみが閲覧可能な権限を設定
```sql
GRANT SELECT(id, column1) ON db1.table1 TO column1_users;
```
3. `column_user` ユーザーを `column1_users` 役割に追加
```sql
GRANT column1_users TO column_user;
```
4. `column1``A` を含む行のみを閲覧できるユーザーの役割を作成
```sql
CREATE ROLE A_rows_users;
```
5. `row_user``A_rows_users` 役割に追加
```sql
GRANT A_rows_users TO row_user;
```
6. `column1``A` の値を持つ行のみを閲覧可能とするポリシーを作成
```sql
CREATE ROW POLICY A_row_filter ON db1.table1 FOR SELECT USING column1 = 'A' TO A_rows_users;
```
7. データベースとテーブルへの権限を設定
```sql
GRANT SELECT(id, column1, column2) ON db1.table1 TO A_rows_users;
```
8. 他の役割に対してもすべての行にアクセスできるように明示的な権限を付与
```sql
CREATE ROW POLICY allow_other_users_filter
ON db1.table1 FOR SELECT USING 1 TO clickhouse_admin, column1_users;
```
:::note
テーブルにポリシーをアタッチすると、システムはそのポリシーを適用し、定義されたユーザーと役割のみがそのテーブルでの操作を行うことができます。その他のユーザーは操作を拒否されます。制限された行ポリシーが他のユーザーに適用されないようにするため、他のユーザーと役割が通常または他のタイプのアクセスを持つことを許可する別のポリシーを定義する必要があります。
:::
## 検証
### カラム制限ユーザーでの役割の権限テスト
1. `clickhouse_admin` ユーザーでClickHouseクライアントにログイン
```
clickhouse-client --user clickhouse_admin --password password
```
2. 管理者ユーザーを使用して、データベース、テーブル、およびすべての行のアクセスを確認。
```sql
SELECT *
FROM db1.table1
```
```response
Query id: f5e906ea-10c6-45b0-b649-36334902d31d
┌─id─┬─column1─┬─column2─┐
│ 1 │ A │ abc │
│ 2 │ A │ def │
│ 3 │ B │ abc │
│ 4 │ B │ def │
└────┴─────────┴─────────┘
```
3. `column_user` ユーザーでClickHouseクライアントにログイン
```
clickhouse-client --user column_user --password password
```
4. すべてのカラムを使用した `SELECT`
```sql
SELECT *
FROM db1.table1
```
```response
Query id: 5576f4eb-7450-435c-a2d6-d6b49b7c4a23
0 rows in set. Elapsed: 0.006 sec.
Received exception from server (version 22.3.2):
Code: 497. DB::Exception: Received from localhost:9000.
DB::Exception: column_user: Not enough privileges.
To execute this query it's necessary to have grant
SELECT(id, column1, column2) ON db1.table1. (ACCESS_DENIED)
```
:::note
すべてのカラムが指定されたためアクセスが拒否されました。ユーザーは `id``column1` のみへのアクセス権を持っています
:::
5. 指定されたカラムのみを用いた `SELECT` クエリを確認:
```sql
SELECT
id,
column1
FROM db1.table1
```
```response
Query id: cef9a083-d5ce-42ff-9678-f08dc60d4bb9
┌─id─┬─column1─┐
│ 1 │ A │
│ 2 │ A │
│ 3 │ B │
│ 4 │ B │
└────┴─────────┘
```
### 行制限ユーザーでの役割の権限テスト
1. `row_user` でClickHouseクライアントにログイン
```
clickhouse-client --user row_user --password password
```
2. 利用可能な行を表示
```sql
SELECT *
FROM db1.table1
```
```response
Query id: a79a113c-1eca-4c3f-be6e-d034f9a220fb
┌─id─┬─column1─┬─column2─┐
│ 1 │ A │ abc │
│ 2 │ A │ def │
└────┴─────────┴─────────┘
```
:::note
上記の2行のみが返されることを確認し、`column1` に `B` の値を持つ行は除外されるべきです。
:::
## ユーザーと役割の変更
ユーザーは必要な権限の組み合わせに対して複数の役割を割り当てることができます。複数の役割を使用する場合、システムは役割を組み合わせて権限を決定し、その結果、役割の権限が累積されます。
例えば、1つの `role1``column1` のみの選択を許可し、`role2` が `column1``column2` の選択を許可する場合、ユーザーは両方のカラムにアクセスできます。
1. 管理者アカウントを使用して、デフォルトの役割で行とカラムの両方を制限する新しいユーザーを作成
```sql
CREATE USER row_and_column_user IDENTIFIED BY 'password' DEFAULT ROLE A_rows_users;
```
2. `A_rows_users` 役割に対する以前の権限を削除
```sql
REVOKE SELECT(id, column1, column2) ON db1.table1 FROM A_rows_users;
```
3. `A_row_users` 役割に `column1` のみの選択を許可
```sql
GRANT SELECT(id, column1) ON db1.table1 TO A_rows_users;
```
4. `row_and_column_user` でClickHouseクライアントにログイン
```
clickhouse-client --user row_and_column_user --password password;
```
5. すべてのカラムでテスト:
```sql
SELECT *
FROM db1.table1
```
```response
Query id: 8cdf0ff5-e711-4cbe-bd28-3c02e52e8bc4
0 rows in set. Elapsed: 0.005 sec.
Received exception from server (version 22.3.2):
Code: 497. DB::Exception: Received from localhost:9000.
DB::Exception: row_and_column_user: Not enough privileges.
To execute this query it's necessary to have grant
SELECT(id, column1, column2) ON db1.table1. (ACCESS_DENIED)
```
6. 制限されたカラムでテスト:
```sql
SELECT
id,
column1
FROM db1.table1
```
```response
Query id: 5e30b490-507a-49e9-9778-8159799a6ed0
┌─id─┬─column1─┐
│ 1 │ A │
│ 2 │ A │
└────┴─────────┘
```
## トラブルシューティング
権限が交差または結合して予期しない結果を生む場合があります。次のコマンドを使用して管理者アカウントを使用して問題を絞り込むことができます。
### ユーザーの権限と役割のリスト
```sql
SHOW GRANTS FOR row_and_column_user
```
```response
Query id: 6a73a3fe-2659-4aca-95c5-d012c138097b
┌─GRANTS FOR row_and_column_user───────────────────────────┐
│ GRANT A_rows_users, column1_users TO row_and_column_user │
└──────────────────────────────────────────────────────────┘
```
### ClickHouse の役割のリスト
```sql
SHOW ROLES
```
```response
Query id: 1e21440a-18d9-4e75-8f0e-66ec9b36470a
┌─name────────────┐
│ A_rows_users │
│ column1_users │
└─────────────────┘
```
### ポリシーの表示
```sql
SHOW ROW POLICIES
```
```response
Query id: f2c636e9-f955-4d79-8e80-af40ea227ebc
┌─name───────────────────────────────────┐
│ A_row_filter ON db1.table1 │
│ allow_other_users_filter ON db1.table1 │
└────────────────────────────────────────┘
```
### ポリシーがどのように定義されているかと現在の権限を表示
```sql
SHOW CREATE ROW POLICY A_row_filter ON db1.table1
```
```response
Query id: 0d3b5846-95c7-4e62-9cdd-91d82b14b80b
┌─CREATE ROW POLICY A_row_filter ON db1.table1────────────────────────────────────────────────┐
│ CREATE ROW POLICY A_row_filter ON db1.table1 FOR SELECT USING column1 = 'A' TO A_rows_users │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
```
## ロール、ポリシー、およびユーザーを管理するためのコマンドの例
次のコマンドを使用して:
- 権限の削除
- ポリシーの削除
- ユーザーを役割から解除
- ユーザーと役割の削除
<br />
:::tip
これらのコマンドは管理者ユーザーまたは `default` ユーザーとして実行してください
:::
### 役割からの権限を削除
```sql
REVOKE SELECT(column1, id) ON db1.table1 FROM A_rows_users;
```
### ポリシーを削除
```sql
DROP ROW POLICY A_row_filter ON db1.table1;
```
### ユーザーを役割から解除
```sql
REVOKE A_rows_users FROM row_user;
```
### 役割を削除
```sql
DROP ROLE A_rows_users;
```
### ユーザーを削除
```sql
DROP USER row_user;
```
## 要約
このドキュメントでは、SQLユーザーと役割の作成の基本を示し、ユーザーおよび役割の権限を設定および変更する手順を提供しました。それぞれの詳細情報については、ユーザーガイドおよびリファレンスドキュメントを参照してください。

Binary file not shown.

Before

Width:  |  Height:  |  Size: 52 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 69 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 61 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 39 KiB

Some files were not shown because too many files have changed in this diff Show More