Merge pull request #35894 from ClickHouse/revert-35858-master

Revert "Format changes for new docs"
This commit is contained in:
Alexey Milovidov 2022-04-04 02:05:51 +03:00 committed by GitHub
commit 599e52fbc8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
519 changed files with 4240 additions and 2451 deletions

121
.github/workflows/docs_release.yml vendored Normal file
View File

@ -0,0 +1,121 @@
name: DocsReleaseChecks
env:
# Force the stdout and stderr streams to be unbuffered
PYTHONUNBUFFERED: 1
concurrency:
group: master-release
cancel-in-progress: true
on: # yamllint disable-line rule:truthy
push:
branches:
- master
paths:
- 'docs/**'
- 'website/**'
- 'benchmark/**'
- 'docker/**'
- '.github/**'
workflow_dispatch:
jobs:
DockerHubPushAarch64:
runs-on: [self-hosted, style-checker-aarch64]
steps:
- name: Clear repository
run: |
sudo rm -fr "$GITHUB_WORKSPACE" && mkdir "$GITHUB_WORKSPACE"
- name: Check out repository code
uses: actions/checkout@v2
- name: Images check
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_images_check.py --suffix aarch64
- name: Upload images files to artifacts
uses: actions/upload-artifact@v2
with:
name: changed_images_aarch64
path: ${{ runner.temp }}/docker_images_check/changed_images_aarch64.json
DockerHubPushAmd64:
runs-on: [self-hosted, style-checker]
steps:
- name: Clear repository
run: |
sudo rm -fr "$GITHUB_WORKSPACE" && mkdir "$GITHUB_WORKSPACE"
- name: Check out repository code
uses: actions/checkout@v2
- name: Images check
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_images_check.py --suffix amd64
- name: Upload images files to artifacts
uses: actions/upload-artifact@v2
with:
name: changed_images_amd64
path: ${{ runner.temp }}/docker_images_check/changed_images_amd64.json
DockerHubPush:
needs: [DockerHubPushAmd64, DockerHubPushAarch64]
runs-on: [self-hosted, style-checker]
steps:
- name: Clear repository
run: |
sudo rm -fr "$GITHUB_WORKSPACE" && mkdir "$GITHUB_WORKSPACE"
- name: Check out repository code
uses: actions/checkout@v2
- name: Download changed aarch64 images
uses: actions/download-artifact@v2
with:
name: changed_images_aarch64
path: ${{ runner.temp }}
- name: Download changed amd64 images
uses: actions/download-artifact@v2
with:
name: changed_images_amd64
path: ${{ runner.temp }}
- name: Images check
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 docker_manifests_merge.py --suffix amd64 --suffix aarch64
- name: Upload images files to artifacts
uses: actions/upload-artifact@v2
with:
name: changed_images
path: ${{ runner.temp }}/changed_images.json
DocsRelease:
needs: DockerHubPush
runs-on: [self-hosted, func-tester]
steps:
- name: Set envs
# https://docs.github.com/en/actions/learn-github-actions/workflow-commands-for-github-actions#multiline-strings
run: |
cat >> "$GITHUB_ENV" << 'EOF'
TEMP_PATH=${{runner.temp}}/docs_release
REPO_COPY=${{runner.temp}}/docs_release/ClickHouse
CLOUDFLARE_TOKEN=${{secrets.CLOUDFLARE}}
ROBOT_CLICKHOUSE_SSH_KEY<<RCSK
${{secrets.ROBOT_CLICKHOUSE_SSH_KEY}}
RCSK
EOF
- name: Clear repository
run: |
sudo rm -fr "$GITHUB_WORKSPACE" && mkdir "$GITHUB_WORKSPACE"
- name: Check out repository code
uses: actions/checkout@v2
- name: Download changed images
uses: actions/download-artifact@v2
with:
name: changed_images
path: ${{ env.TEMP_PATH }}
- name: Docs Release
run: |
sudo rm -fr "$TEMP_PATH"
mkdir -p "$TEMP_PATH"
cp -r "$GITHUB_WORKSPACE" "$TEMP_PATH"
cd "$REPO_COPY/tests/ci"
python3 docs_release.py
- name: Cleanup
if: always()
run: |
docker kill "$(docker ps -q)" ||:
docker rm -f "$(docker ps -a -q)" ||:
sudo rm -fr "$TEMP_PATH"

View File

@ -1,8 +0,0 @@
position: 50
label: 'Reference Guides'
collapsible: true
collapsed: true
link:
type: generated-index
title: Reference Guides
slug: /en

View File

@ -0,0 +1,9 @@
---
toc_priority: 1
toc_title: Cloud
---
# ClickHouse Cloud Service {#clickhouse-cloud-service}
!!! info "Info"
Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more.

View File

@ -0,0 +1,13 @@
---
toc_folder_title: Commercial
toc_priority: 70
toc_title: Introduction
---
# ClickHouse Commercial Services {#clickhouse-commercial-services}
Service categories:
- [Cloud](../commercial/cloud.md)
- [Support](../commercial/support.md)

View File

@ -0,0 +1,9 @@
---
toc_priority: 3
toc_title: Support
---
# ClickHouse Commercial Support Service {#clickhouse-commercial-support-service}
!!! info "Info"
Detailed public description for ClickHouse support services is not ready yet, please [contact us](https://clickhouse.com/company/#contact) to learn more.

View File

@ -1,7 +0,0 @@
position: 100
label: 'Building ClickHouse'
collapsible: true
collapsed: true
link:
type: generated-index
title: Building ClickHouse

View File

@ -1,9 +1,3 @@
---
sidebar_label: Adding Test Queries
sidebar_position: 63
description: Instructions on how to add a test case to ClickHouse continuous integration
---
# How to add test queries to ClickHouse CI
ClickHouse has hundreds (or even thousands) of features. Every commit gets checked by a complex set of tests containing many thousands of test cases.

View File

@ -1,12 +1,11 @@
---
sidebar_label: Architecture Overview
sidebar_position: 62
toc_priority: 62
toc_title: Architecture Overview
---
# Overview of ClickHouse Architecture
# Overview of ClickHouse Architecture {#overview-of-clickhouse-architecture}
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns).
Whenever possible, operations are dispatched on arrays, rather than on individual values. It is called “vectorized query execution” and it helps lower the cost of actual data processing.
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns). Whenever possible, operations are dispatched on arrays, rather than on individual values. It is called “vectorized query execution” and it helps lower the cost of actual data processing.
> This idea is nothing new. It dates back to the `APL` (A programming language, 1957) and its descendants: `A +` (APL dialect), `J` (1990), `K` (1993), and `Q` (programming language from Kx Systems, 2003). Array programming is used in scientific data processing. Neither is this idea something new in relational databases: for example, it is used in the `VectorWise` system (also known as Actian Vector Analytic Database by Actian Corporation).
@ -155,9 +154,8 @@ The server initializes the `Context` class with the necessary environment for qu
We maintain full backward and forward compatibility for the server TCP protocol: old clients can talk to new servers, and new clients can talk to old servers. But we do not want to maintain it eternally, and we are removing support for old versions after about one year.
:::note
For most external applications, we recommend using the HTTP interface because it is simple and easy to use. The TCP protocol is more tightly linked to internal data structures: it uses an internal format for passing blocks of data, and it uses custom framing for compressed data. We havent released a C library for that protocol because it requires linking most of the ClickHouse codebase, which is not practical.
:::
!!! note "Note"
For most external applications, we recommend using the HTTP interface because it is simple and easy to use. The TCP protocol is more tightly linked to internal data structures: it uses an internal format for passing blocks of data, and it uses custom framing for compressed data. We havent released a C library for that protocol because it requires linking most of the ClickHouse codebase, which is not practical.
## Distributed Query Execution {#distributed-query-execution}
@ -195,8 +193,7 @@ Replication is physical: only compressed parts are transferred between nodes, no
Besides, each replica stores its state in ZooKeeper as the set of parts and its checksums. When the state on the local filesystem diverges from the reference state in ZooKeeper, the replica restores its consistency by downloading missing and broken parts from other replicas. When there is some unexpected or broken data in the local filesystem, ClickHouse does not remove it, but moves it to a separate directory and forgets it.
:::note
The ClickHouse cluster consists of independent shards, and each shard consists of replicas. The cluster is **not elastic**, so after adding a new shard, data is not rebalanced between shards automatically. Instead, the cluster load is supposed to be adjusted to be uneven. This implementation gives you more control, and it is ok for relatively small clusters, such as tens of nodes. But for clusters with hundreds of nodes that we are using in production, this approach becomes a significant drawback. We should implement a table engine that spans across the cluster with dynamically replicated regions that could be split and balanced between clusters automatically.
:::
!!! note "Note"
The ClickHouse cluster consists of independent shards, and each shard consists of replicas. The cluster is **not elastic**, so after adding a new shard, data is not rebalanced between shards automatically. Instead, the cluster load is supposed to be adjusted to be uneven. This implementation gives you more control, and it is ok for relatively small clusters, such as tens of nodes. But for clusters with hundreds of nodes that we are using in production, this approach becomes a significant drawback. We should implement a table engine that spans across the cluster with dynamically replicated regions that could be split and balanced between clusters automatically.
[Original article](https://clickhouse.com/docs/en/development/architecture/)
{## [Original article](https://clickhouse.com/docs/en/development/architecture/) ##}

View File

@ -1,13 +1,12 @@
---
sidebar_label: Source Code Browser
sidebar_position: 72
description: Various ways to browse and edit the source code
toc_priority: 72
toc_title: Source Code Browser
---
# Browse ClickHouse Source Code
# Browse ClickHouse Source Code {#browse-clickhouse-source-code}
You can use the **Woboq** online code browser available [here](https://clickhouse.com/codebrowser/ClickHouse/src/index.html). It provides code navigation and semantic highlighting, search and indexing. The code snapshot is updated daily.
You can use **Woboq** online code browser available [here](https://clickhouse.com/codebrowser/ClickHouse/src/index.html). It provides code navigation and semantic highlighting, search and indexing. The code snapshot is updated daily.
Also, you can browse sources on [GitHub](https://github.com/ClickHouse/ClickHouse) as usual.
If youre interested what IDE to use, we recommend CLion, QT Creator, VS Code and KDevelop (with caveats). You can use any favorite IDE. Vim and Emacs also count.
If youre interested what IDE to use, we recommend CLion, QT Creator, VS Code and KDevelop (with caveats). You can use any favourite IDE. Vim and Emacs also count.

View File

@ -1,12 +1,11 @@
---
sidebar_position: 67
sidebar_label: Build on Linux for AARCH64 (ARM64)
toc_priority: 67
toc_title: Build on Linux for AARCH64 (ARM64)
---
# How to Build ClickHouse on Linux for AARCH64 (ARM64) Architecture
# How to Build ClickHouse on Linux for AARCH64 (ARM64) Architecture {#how-to-build-clickhouse-on-linux-for-aarch64-arm64-architecture}
This is for the case when you have Linux machine and want to use it to build `clickhouse` binary that will run on another Linux machine with AARCH64 CPU architecture.
This is intended for continuous integration checks that run on Linux servers.
This is for the case when you have Linux machine and want to use it to build `clickhouse` binary that will run on another Linux machine with AARCH64 CPU architecture. This is intended for continuous integration checks that run on Linux servers.
The cross-build for AARCH64 is based on the [Build instructions](../development/build.md), follow them first.

View File

@ -1,12 +1,11 @@
---
sidebar_position: 66
sidebar_label: Build on Linux for Mac OS X
toc_priority: 66
toc_title: Build on Linux for Mac OS X
---
# How to Build ClickHouse on Linux for Mac OS X
# How to Build ClickHouse on Linux for Mac OS X {#how-to-build-clickhouse-on-linux-for-mac-os-x}
This is for the case when you have a Linux machine and want to use it to build `clickhouse` binary that will run on OS X.
This is intended for continuous integration checks that run on Linux servers. If you want to build ClickHouse directly on Mac OS X, then proceed with [another instruction](../development/build-osx.md).
This is for the case when you have Linux machine and want to use it to build `clickhouse` binary that will run on OS X. This is intended for continuous integration checks that run on Linux servers. If you want to build ClickHouse directly on Mac OS X, then proceed with [another instruction](../development/build-osx.md).
The cross-build for Mac OS X is based on the [Build instructions](../development/build.md), follow them first.

View File

@ -1,9 +1,9 @@
---
sidebar_position: 68
sidebar_label: Build on Linux for RISC-V 64
toc_priority: 68
toc_title: Build on Linux for RISC-V 64
---
# How to Build ClickHouse on Linux for RISC-V 64 Architecture
# How to Build ClickHouse on Linux for RISC-V 64 Architecture {#how-to-build-clickhouse-on-linux-for-risc-v-64-architecture}
As of writing (11.11.2021) building for risc-v considered to be highly experimental. Not all features can be enabled.

View File

@ -1,21 +1,16 @@
---
sidebar_position: 65
sidebar_label: Build on Mac OS X
description: How to build ClickHouse on Mac OS X
toc_priority: 65
toc_title: Build on Mac OS X
---
# How to Build ClickHouse on Mac OS X
# How to Build ClickHouse on Mac OS X {#how-to-build-clickhouse-on-mac-os-x}
:::info You don't have to build ClickHouse yourself!
You can install pre-built ClickHouse as described in [Quick Start](https://clickhouse.com/#quick-start). Follow **macOS (Intel)** or **macOS (Apple silicon)** installation instructions.
:::
!!! info "You don't have to build ClickHouse yourself"
You can install pre-built ClickHouse as described in [Quick Start](https://clickhouse.com/#quick-start).
Follow `macOS (Intel)` or `macOS (Apple silicon)` installation instructions.
Build should work on x86_64 (Intel) and arm64 (Apple silicon) based macOS 10.15 (Catalina) and higher with Homebrew's vanilla Clang.
It is always recommended to use vanilla `clang` compiler.
:::note
It is possible to use XCode's `apple-clang` or `gcc`, but it's strongly discouraged.
:::
It is always recommended to use vanilla `clang` compiler. It is possible to use XCode's `apple-clang` or `gcc` but it's strongly discouraged.
## Install Homebrew {#install-homebrew}
@ -94,9 +89,8 @@ cmake --build . --config RelWithDebInfo
If you intend to run `clickhouse-server`, make sure to increase the systems maxfiles variable.
:::note
Youll need to use sudo.
:::
!!! info "Note"
Youll need to use sudo.
To do so, create the `/Library/LaunchDaemons/limit.maxfiles.plist` file with the following content:

View File

@ -1,10 +1,9 @@
---
sidebar_position: 64
sidebar_label: Build on Linux
description: How to build ClickHouse on Linux
toc_priority: 64
toc_title: Build on Linux
---
# How to Build ClickHouse on Linux
# How to Build ClickHouse on Linux {#how-to-build-clickhouse-for-development}
Supported platforms:

View File

@ -1,7 +1,6 @@
---
sidebar_position: 62
sidebar_label: Continuous Integration Checks
description: When you submit a pull request, some automated checks are ran for your code by the ClickHouse continuous integration (CI) system
toc_priority: 62
toc_title: Continuous Integration Checks
---
# Continuous Integration Checks
@ -72,6 +71,8 @@ This check means that the CI system started to process the pull request. When it
Performs some simple regex-based checks of code style, using the [`utils/check-style/check-style`](https://github.com/ClickHouse/ClickHouse/blob/master/utils/check-style/check-style) binary (note that it can be run locally).
If it fails, fix the style errors following the [code style guide](style.md).
Python code is checked with [black](https://github.com/psf/black/).
### Report Details
- [Status page example](https://clickhouse-test-reports.s3.yandex.net/12550/659c78c7abb56141723af6a81bfae39335aa8cb2/style_check.html)
- `output.txt` contains the check resulting errors (invalid tabulation etc), blank page means no errors. [Successful result example](https://clickhouse-test-reports.s3.yandex.net/12550/659c78c7abb56141723af6a81bfae39335aa8cb2/style_check/output.txt).
@ -151,7 +152,7 @@ checks page](../development/build.md#you-dont-have-to-build-clickhouse), or buil
## Functional Stateful Tests
Runs [stateful functional tests](tests.md#functional-tests). Treat them in the same way as the functional stateless tests. The difference is that they require `hits` and `visits` tables from the [clickstream dataset](../example-datasets/metrica.md) to run.
Runs [stateful functional tests](tests.md#functional-tests). Treat them in the same way as the functional stateless tests. The difference is that they require `hits` and `visits` tables from the [Yandex.Metrica dataset](../getting-started/example-datasets/metrica.md) to run.
## Integration Tests

View File

@ -1,10 +1,9 @@
---
sidebar_position: 71
sidebar_label: Third-Party Libraries
description: A list of third-party libraries used
toc_priority: 71
toc_title: Third-Party Libraries Used
---
# Third-Party Libraries Used
# Third-Party Libraries Used {#third-party-libraries-used}
The list of third-party libraries:

View File

@ -1,12 +1,11 @@
---
sidebar_position: 61
sidebar_label: Getting Started
description: Prerequisites and an overview of how to build ClickHouse
toc_priority: 61
toc_title: For Beginners
---
# Getting Started Guide for Building ClickHouse
# The Beginner ClickHouse Developer Instruction {#the-beginner-clickhouse-developer-instruction}
The building of ClickHouse is supported on Linux, FreeBSD and Mac OS X.
Building of ClickHouse is supported on Linux, FreeBSD and Mac OS X.
If you use Windows, you need to create a virtual machine with Ubuntu. To start working with a virtual machine please install VirtualBox. You can download Ubuntu from the website: https://www.ubuntu.com/#download. Please create a virtual machine from the downloaded image (you should reserve at least 4GB of RAM for it). To run a command-line terminal in Ubuntu, please locate a program containing the word “terminal” in its name (gnome-terminal, konsole etc.) or just press Ctrl+Alt+T.
@ -230,6 +229,25 @@ As simple code editors, you can use Sublime Text or Visual Studio Code, or Kate
Just in case, it is worth mentioning that CLion creates `build` path on its own, it also on its own selects `debug` for build type, for configuration it uses a version of CMake that is defined in CLion and not the one installed by you, and finally, CLion will use `make` to run build tasks instead of `ninja`. This is normal behaviour, just keep that in mind to avoid confusion.
## Debugging
Many graphical IDEs offer with an integrated debugger but you can also use a standalone debugger.
### GDB
### LLDB
# tell LLDB where to find the source code
settings set target.source-map /path/to/build/dir /path/to/source/dir
# configure LLDB to display code before/after currently executing line
settings set stop-line-count-before 10
settings set stop-line-count-after 10
target create ./clickhouse-client
# <set breakpoints here>
process launch -- --query="SELECT * FROM TAB"
## Writing Code {#writing-code}
The description of ClickHouse architecture can be found here: https://clickhouse.com/docs/en/development/architecture/

View File

@ -0,0 +1,10 @@
---
toc_folder_title: Development
toc_hidden: true
toc_priority: 58
toc_title: hidden
---
# ClickHouse Development {#clickhouse-development}
[Original article](https://clickhouse.com/docs/en/development/) <!--hide-->

View File

@ -1,10 +1,9 @@
---
sidebar_position: 69
sidebar_label: C++ Guide
description: A list of recommendations regarding coding style, naming convention, formatting and more
toc_priority: 69
toc_title: C++ Guide
---
# How to Write C++ Code
# How to Write C++ Code {#how-to-write-c-code}
## General Recommendations {#general-recommendations}

View File

@ -1,12 +1,11 @@
---
sidebar_position: 70
sidebar_label: Testing
description: Most of ClickHouse features can be tested with functional tests and they are mandatory to use for every change in ClickHouse code that can be tested that way.
toc_priority: 70
toc_title: Testing
---
# ClickHouse Testing
# ClickHouse Testing {#clickhouse-testing}
## Functional Tests
## Functional Tests {#functional-tests}
Functional tests are the most simple and convenient to use. Most of ClickHouse features can be tested with functional tests and they are mandatory to use for every change in ClickHouse code that can be tested that way.

View File

@ -1,8 +0,0 @@
position: 30
label: 'Database & Table Engines'
collapsible: true
collapsed: true
link:
type: generated-index
title: Database & Table Engines
slug: /en/table-engines

View File

@ -1,9 +1,9 @@
---
sidebar_label: Atomic
sidebar_position: 10
toc_priority: 32
toc_title: Atomic
---
# Atomic
# Atomic {#atomic}
It supports non-blocking [DROP TABLE](#drop-detach-table) and [RENAME TABLE](#rename-table) queries and atomic [EXCHANGE TABLES](#exchange-tables) queries. `Atomic` database engine is used by default.
@ -18,21 +18,14 @@ CREATE DATABASE test [ENGINE = Atomic];
### Table UUID {#table-uuid}
All tables in database `Atomic` have persistent [UUID](../../sql-reference/data-types/uuid.md) and store data in directory `/clickhouse_path/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/`, where `xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy` is UUID of the table.
Usually, the UUID is generated automatically, but the user can also explicitly specify the UUID in the same way when creating the table (this is not recommended).
For example:
Usually, the UUID is generated automatically, but the user can also explicitly specify the UUID in the same way when creating the table (this is not recommended). To display the `SHOW CREATE` query with the UUID you can use setting [show_table_uuid_in_table_create_query_if_not_nil](../../operations/settings/settings.md#show_table_uuid_in_table_create_query_if_not_nil). For example:
```sql
CREATE TABLE name UUID '28f1c61c-2970-457a-bffe-454156ddcfef' (n UInt64) ENGINE = ...;
```
:::note
You can use the [show_table_uuid_in_table_create_query_if_not_nil](../../operations/settings/settings.md#show_table_uuid_in_table_create_query_if_not_nil) setting to display the UUID with the `SHOW CREATE` query.
:::
### RENAME TABLE {#rename-table}
[RENAME](../../sql-reference/statements/rename.md) queries are performed without changing the UUID or moving table data. These queries do not wait for the completion of queries using the table and are executed instantly.
[RENAME](../../sql-reference/statements/rename.md) queries are performed without changing UUID and moving table data. These queries do not wait for the completion of queries using the table and are executed instantly.
### DROP/DETACH TABLE {#drop-detach-table}

View File

@ -6,11 +6,11 @@ toc_title: Introduction
# Database Engines {#database-engines}
Database engines allow you to work with tables. By default, ClickHouse uses the [Atomic](../../engines/database-engines/atomic.md) database engine, which provides configurable [table engines](../../engines/table-engines/index.md) and an [SQL dialect](../../sql-reference/syntax.md).
Database engines allow you to work with tables.
Here is a complete list of available database engines. Follow the links for more details:
By default, ClickHouse uses database engine [Atomic](../../engines/database-engines/atomic.md). It provides configurable [table engines](../../engines/table-engines/index.md) and an [SQL dialect](../../sql-reference/syntax.md).
- [Atomic](../../engines/database-engines/atomic.md)
You can also use the following database engines:
- [MySQL](../../engines/database-engines/mysql.md)
@ -18,6 +18,8 @@ Here is a complete list of available database engines. Follow the links for more
- [Lazy](../../engines/database-engines/lazy.md)
- [Atomic](../../engines/database-engines/atomic.md)
- [PostgreSQL](../../engines/database-engines/postgresql.md)
- [Replicated](../../engines/database-engines/replicated.md)

View File

@ -1,6 +1,6 @@
---
sidebar_label: Lazy
sidebar_position: 20
toc_priority: 31
toc_title: Lazy
---
# Lazy {#lazy}

View File

@ -1,15 +1,16 @@
---
sidebar_label: MaterializedMySQL
sidebar_position: 70
toc_priority: 29
toc_title: MaterializedMySQL
---
# [experimental] MaterializedMySQL
# [experimental] MaterializedMySQL {#materialized-mysql}
:::warning
This is an experimental feature that should not be used in production.
:::
!!! warning "Warning"
This is an experimental feature that should not be used in production.
Creates a ClickHouse database with all the tables existing in MySQL, and all the data in those tables. The ClickHouse server works as MySQL replica. It reads `binlog` and performs DDL and DML queries.
Creates ClickHouse database with all the tables existing in MySQL, and all the data in those tables.
ClickHouse server works as MySQL replica. It reads binlog and performs DDL and DML queries.
## Creating a Database {#creating-a-database}
@ -30,6 +31,8 @@ ENGINE = MaterializedMySQL('host:port', ['database' | database], 'user', 'passwo
- `max_rows_in_buffer` — Maximum number of rows that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`.
- `max_bytes_in_buffer` — Maximum number of bytes that data is allowed to cache in memory (for single table and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`.
- `max_rows_in_buffers` — Maximum number of rows that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `65 505`.
- `max_bytes_in_buffers` — Maximum number of bytes that data is allowed to cache in memory (for database and the cache data unable to query). When this number is exceeded, the data will be materialized. Default: `1 048 576`.
- `max_flush_data_time` — Maximum number of milliseconds that data is allowed to cache in memory (for database and the cache data unable to query). When this time is exceeded, the data will be materialized. Default: `1000`.
- `max_wait_time_when_mysql_unavailable` — Retry interval when MySQL is not available (milliseconds). Negative value disables retry. Default: `1000`.
- `allows_query_when_mysql_lost` — Allows to query a materialized table when MySQL is lost. Default: `0` (`false`).
@ -49,9 +52,8 @@ For the correct work of `MaterializedMySQL`, there are few mandatory `MySQL`-sid
- `default_authentication_plugin = mysql_native_password` since `MaterializedMySQL` can only authorize with this method.
- `gtid_mode = on` since GTID based logging is a mandatory for providing correct `MaterializedMySQL` replication.
:::note
While turning on `gtid_mode` you should also specify `enforce_gtid_consistency = on`.
:::
!!! attention "Attention"
While turning on `gtid_mode` you should also specify `enforce_gtid_consistency = on`.
## Virtual Columns {#virtual-columns}
@ -74,7 +76,7 @@ When working with the `MaterializedMySQL` database engine, [ReplacingMergeTree](
| FLOAT | [Float32](../../sql-reference/data-types/float.md) |
| DOUBLE | [Float64](../../sql-reference/data-types/float.md) |
| DECIMAL, NEWDECIMAL | [Decimal](../../sql-reference/data-types/decimal.md) |
| DATE, NEWDATE | [Date](../../sql-reference/data-types/date.md) |
| DATE, NEWDATE | [Date32](../../sql-reference/data-types/date32.md) |
| DATETIME, TIMESTAMP | [DateTime](../../sql-reference/data-types/datetime.md) |
| DATETIME2, TIMESTAMP2 | [DateTime64](../../sql-reference/data-types/datetime64.md) |
| YEAR | [UInt16](../../sql-reference/data-types/int-uint.md) |
@ -218,14 +220,13 @@ extra care needs to be taken.
You may specify overrides for tables that do not exist yet.
:::warning
It is easy to break replication with table overrides if not used with care. For example:
!!! warning "Warning"
It is easy to break replication with table overrides if not used with care. For example:
* If an ALIAS column is added with a table override, and a column with the same name is later added to the source
MySQL table, the converted ALTER TABLE query in ClickHouse will fail and replication stops.
* It is currently possible to add overrides that reference nullable columns where not-nullable are required, such as in
`ORDER BY` or `PARTITION BY`. This will cause CREATE TABLE queries that will fail, also causing replication to stop.
:::
* If an ALIAS column is added with a table override, and a column with the same name is later added to the source
MySQL table, the converted ALTER TABLE query in ClickHouse will fail and replication stops.
* It is currently possible to add overrides that reference nullable columns where not-nullable are required, such as in
`ORDER BY` or `PARTITION BY`. This will cause CREATE TABLE queries that will fail, also causing replication to stop.
## Examples of Use {#examples-of-use}

View File

@ -1,6 +1,6 @@
---
sidebar_label: MaterializedPostgreSQL
sidebar_position: 60
toc_priority: 30
toc_title: MaterializedPostgreSQL
---
# [experimental] MaterializedPostgreSQL {#materialize-postgresql}
@ -46,9 +46,7 @@ After `MaterializedPostgreSQL` database is created, it does not automatically de
ATTACH TABLE postgres_database.new_table;
```
:::warning
Before version 22.1, adding a table to replication left an unremoved temporary replication slot (named `{db_name}_ch_replication_slot_tmp`). If attaching tables in ClickHouse version before 22.1, make sure to delete it manually (`SELECT pg_drop_replication_slot('{db_name}_ch_replication_slot_tmp')`). Otherwise disk usage will grow. This issue is fixed in 22.1.
:::
Warning: before version 22.1 adding table to replication left unremoved temprorary replication slot (named `{db_name}_ch_replication_slot_tmp`). If attaching tables in clickhouse version before 22.1, make sure to delete it manually (`SELECT pg_drop_replication_slot('{db_name}_ch_replication_slot_tmp')`). Otherwise disk usage will grow. Issue is fixed in 22.1.
## Dynamically removing tables from replication {#dynamically-removing-table-from-replication}
@ -137,70 +135,69 @@ FROM pg_class
WHERE oid = 'postgres_table'::regclass;
```
:::warning
Replication of [**TOAST**](https://www.postgresql.org/docs/9.5/storage-toast.html) values is not supported. The default value for the data type will be used.
:::
!!! warning "Warning"
Replication of [**TOAST**](https://www.postgresql.org/docs/9.5/storage-toast.html) values is not supported. The default value for the data type will be used.
## Settings {#settings}
1. `materialized_postgresql_tables_list` {#materialized-postgresql-tables-list}
1. materialized_postgresql_tables_list {#materialized-postgresql-tables-list}
Sets a comma-separated list of PostgreSQL database tables, which will be replicated via [MaterializedPostgreSQL](../../engines/database-engines/materialized-postgresql.md) database engine.
Sets a comma-separated list of PostgreSQL database tables, which will be replicated via [MaterializedPostgreSQL](../../engines/database-engines/materialized-postgresql.md) database engine.
Default value: empty list — means whole PostgreSQL database will be replicated.
Default value: empty list — means whole PostgreSQL database will be replicated.
2. `materialized_postgresql_schema` {#materialized-postgresql-schema}
2. materialized_postgresql_schema {#materialized-postgresql-schema}
Default value: empty string. (Default schema is used)
Default value: empty string. (Default schema is used)
3. `materialized_postgresql_schema_list` {#materialized-postgresql-schema-list}
3. materialized_postgresql_schema_list {#materialized-postgresql-schema-list}
Default value: empty list. (Default schema is used)
Default value: empty list. (Default schema is used)
4. `materialized_postgresql_allow_automatic_update` {#materialized-postgresql-allow-automatic-update}
4. materialized_postgresql_allow_automatic_update {#materialized-postgresql-allow-automatic-update}
Do not use this setting before 22.1 version.
Do not use this setting before 22.1 version.
Allows reloading table in the background, when schema changes are detected. DDL queries on the PostgreSQL side are not replicated via ClickHouse [MaterializedPostgreSQL](../../engines/database-engines/materialized-postgresql.md) engine, because it is not allowed with PostgreSQL logical replication protocol, but the fact of DDL changes is detected transactionally. In this case, the default behaviour is to stop replicating those tables once DDL is detected. However, if this setting is enabled, then, instead of stopping the replication of those tables, they will be reloaded in the background via database snapshot without data losses and replication will continue for them.
Allows reloading table in the background, when schema changes are detected. DDL queries on the PostgreSQL side are not replicated via ClickHouse [MaterializedPostgreSQL](../../engines/database-engines/materialized-postgresql.md) engine, because it is not allowed with PostgreSQL logical replication protocol, but the fact of DDL changes is detected transactionally. In this case, the default behaviour is to stop replicating those tables once DDL is detected. However, if this setting is enabled, then, instead of stopping the replication of those tables, they will be reloaded in the background via database snapshot without data losses and replication will continue for them.
Possible values:
Possible values:
- 0 — The table is not automatically updated in the background, when schema changes are detected.
- 1 — The table is automatically updated in the background, when schema changes are detected.
- 0 — The table is not automatically updated in the background, when schema changes are detected.
- 1 — The table is automatically updated in the background, when schema changes are detected.
Default value: `0`.
Default value: `0`.
5. `materialized_postgresql_max_block_size` {#materialized-postgresql-max-block-size}
5. materialized_postgresql_max_block_size {#materialized-postgresql-max-block-size}
Sets the number of rows collected in memory before flushing data into PostgreSQL database table.
Sets the number of rows collected in memory before flushing data into PostgreSQL database table.
Possible values:
Possible values:
- Positive integer.
- Positive integer.
Default value: `65536`.
Default value: `65536`.
6. `materialized_postgresql_replication_slot` {#materialized-postgresql-replication-slot}
6. materialized_postgresql_replication_slot {#materialized-postgresql-replication-slot}
A user-created replication slot. Must be used together with `materialized_postgresql_snapshot`.
A user-created replication slot. Must be used together with `materialized_postgresql_snapshot`.
7. `materialized_postgresql_snapshot` {#materialized-postgresql-snapshot}
7. materialized_postgresql_snapshot {#materialized-postgresql-snapshot}
A text string identifying a snapshot, from which [initial dump of PostgreSQL tables](../../engines/database-engines/materialized-postgresql.md) will be performed. Must be used together with `materialized_postgresql_replication_slot`.
A text string identifying a snapshot, from which [initial dump of PostgreSQL tables](../../engines/database-engines/materialized-postgresql.md) will be performed. Must be used together with `materialized_postgresql_replication_slot`.
``` sql
CREATE DATABASE database1
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password')
SETTINGS materialized_postgresql_tables_list = 'table1,table2,table3';
``` sql
CREATE DATABASE database1
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password')
SETTINGS materialized_postgresql_tables_list = 'table1,table2,table3';
SELECT * FROM database1.table1;
```
SELECT * FROM database1.table1;
```
The settings can be changed, if necessary, using a DDL query. But it is impossible to change the setting `materialized_postgresql_tables_list`. To update the list of tables in this setting use the `ATTACH TABLE` query.
The settings can be changed, if necessary, using a DDL query. But it is impossible to change the setting `materialized_postgresql_tables_list`. To update the list of tables in this setting use the `ATTACH TABLE` query.
``` sql
ALTER DATABASE postgres_database MODIFY SETTING materialized_postgresql_max_block_size = <new_size>;
```
``` sql
ALTER DATABASE postgres_database MODIFY SETTING materialized_postgresql_max_block_size = <new_size>;
```
## Notes {#notes}
@ -216,47 +213,47 @@ Please note that this should be used only if it is actually needed. If there is
1. Configure replication slot in PostgreSQL.
```yaml
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-demo-cluster
spec:
numberOfInstances: 2
postgresql:
parameters:
wal_level: logical
patroni:
slots:
clickhouse_sync:
type: logical
database: demodb
plugin: pgoutput
```
```yaml
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-demo-cluster
spec:
numberOfInstances: 2
postgresql:
parameters:
wal_level: logical
patroni:
slots:
clickhouse_sync:
type: logical
database: demodb
plugin: pgoutput
```
2. Wait for replication slot to be ready, then begin a transaction and export the transaction snapshot identifier:
```sql
BEGIN;
SELECT pg_export_snapshot();
```
```sql
BEGIN;
SELECT pg_export_snapshot();
```
3. In ClickHouse create database:
```sql
CREATE DATABASE demodb
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password')
SETTINGS
materialized_postgresql_replication_slot = 'clickhouse_sync',
materialized_postgresql_snapshot = '0000000A-0000023F-3',
materialized_postgresql_tables_list = 'table1,table2,table3';
```
```sql
CREATE DATABASE demodb
ENGINE = MaterializedPostgreSQL('postgres1:5432', 'postgres_database', 'postgres_user', 'postgres_password')
SETTINGS
materialized_postgresql_replication_slot = 'clickhouse_sync',
materialized_postgresql_snapshot = '0000000A-0000023F-3',
materialized_postgresql_tables_list = 'table1,table2,table3';
```
4. End the PostgreSQL transaction once replication to ClickHouse DB is confirmed. Verify that replication continues after failover:
```bash
kubectl exec acid-demo-cluster-0 -c postgres -- su postgres -c 'patronictl failover --candidate acid-demo-cluster-1 --force'
```
```bash
kubectl exec acid-demo-cluster-0 -c postgres -- su postgres -c 'patronictl failover --candidate acid-demo-cluster-1 --force'
```
### Required permissions

View File

@ -1,9 +1,9 @@
---
sidebar_position: 50
sidebar_label: MySQL
toc_priority: 30
toc_title: MySQL
---
# MySQL
# MySQL {#mysql}
Allows to connect to databases on a remote MySQL server and perform `INSERT` and `SELECT` queries to exchange data between ClickHouse and MySQL.
@ -49,6 +49,8 @@ ENGINE = MySQL('host:port', ['database' | database], 'user', 'password')
All other MySQL data types are converted into [String](../../sql-reference/data-types/string.md).
Because of the ClickHouse date type has a different range from the MySQL date range,If the MySQL date type is out of the range of ClickHouse date, you can use the setting mysql_datatypes_support_level to modify the mapping from the MySQL date type to the Clickhouse date type: date2Date32 (convert MySQL's date type to ClickHouse Date32) or date2String(convert MySQL's date type to ClickHouse String,this is usually used when your mysql data is less than 1925) or default(convert MySQL's date type to ClickHouse Date).
[Nullable](../../sql-reference/data-types/nullable.md) is supported.
## Global Variables Support {#global-variables-support}
@ -59,9 +61,8 @@ These variables are supported:
- `version`
- `max_allowed_packet`
:::warning
By now these variables are stubs and don't correspond to anything.
:::
!!! warning "Warning"
By now these variables are stubs and don't correspond to anything.
Example:

View File

@ -1,6 +1,6 @@
---
sidebar_position: 40
sidebar_label: PostgreSQL
toc_priority: 35
toc_title: PostgreSQL
---
# PostgreSQL {#postgresql}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 30
sidebar_label: Replicated
toc_priority: 36
toc_title: Replicated
---
# [experimental] Replicated {#replicated}
@ -20,9 +20,8 @@ One ClickHouse server can have multiple replicated databases running and updatin
- `shard_name` — Shard name. Database replicas are grouped into shards by `shard_name`.
- `replica_name` — Replica name. Replica names must be different for all replicas of the same shard.
:::warning
For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables if no arguments provided, then default arguments are used: `/clickhouse/tables/{uuid}/{shard}` and `{replica}`. These can be changed in the server settings [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). Macro `{uuid}` is unfolded to table's uuid, `{shard}` and `{replica}` are unfolded to values from server config, not from database engine arguments. But in the future, it will be possible to use `shard_name` and `replica_name` of Replicated database.
:::
!!! note "Warning"
For [ReplicatedMergeTree](../table-engines/mergetree-family/replication.md#table_engines-replication) tables if no arguments provided, then default arguments are used: `/clickhouse/tables/{uuid}/{shard}` and `{replica}`. These can be changed in the server settings [default_replica_path](../../operations/server-configuration-parameters/settings.md#default_replica_path) and [default_replica_name](../../operations/server-configuration-parameters/settings.md#default_replica_name). Macro `{uuid}` is unfolded to table's uuid, `{shard}` and `{replica}` are unfolded to values from server config, not from database engine arguments. But in the future, it will be possible to use `shard_name` and `replica_name` of Replicated database.
## Specifics and Recommendations {#specifics-and-recommendations}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 55
sidebar_label: SQLite
toc_priority: 32
toc_title: SQLite
---
# SQLite {#sqlite}

15
docs/en/engines/index.md Normal file
View File

@ -0,0 +1,15 @@
---
toc_folder_title: Engines
toc_hidden: true
toc_priority: 25
toc_title: hidden
---
# ClickHouse Engines {#clickhouse-engines}
There are two key engine kinds in ClickHouse:
- [Table engines](../engines/table-engines/index.md)
- [Database engines](../engines/database-engines/index.md)
{## [Original article](https://clickhouse.com/docs/en/engines/) ##}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 12
sidebar_label: ExternalDistributed
toc_priority: 12
toc_title: ExternalDistributed
---
# ExternalDistributed {#externaldistributed}
@ -51,6 +51,3 @@ You can specify any number of shards and any number of replicas for each shard.
- [MySQL table engine](../../../engines/table-engines/integrations/mysql.md)
- [PostgreSQL table engine](../../../engines/table-engines/integrations/postgresql.md)
- [Distributed table engine](../../../engines/table-engines/special/distributed.md)
[Original article](https://clickhouse.com/docs/en/engines/table-engines/integrations/ExternalDistributed/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 9
sidebar_label: EmbeddedRocksDB
toc_priority: 9
toc_title: EmbeddedRocksDB
---
# EmbeddedRocksDB Engine {#EmbeddedRocksDB-engine}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 6
sidebar_label: HDFS
toc_priority: 6
toc_title: HDFS
---
# HDFS {#table_engines-hdfs}
@ -98,9 +98,8 @@ Table consists of all the files in both directories (all files should satisfy fo
CREATE TABLE table_with_asterisk (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV')
```
:::warning
If the listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
:::
!!! warning "Warning"
If the listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
**Example**

View File

@ -1,6 +1,6 @@
---
sidebar_position: 4
sidebar_label: Hive
toc_priority: 4
toc_title: Hive
---
# Hive {#hive}
@ -137,7 +137,7 @@ CREATE TABLE test.test_orc
`f_array_array_float` Array(Array(Float32)),
`day` String
)
ENGINE = Hive('thrift://202.168.117.26:9083', 'test', 'test_orc')
ENGINE = Hive('thrift://localhost:9083', 'test', 'test_orc')
PARTITION BY day
```
@ -406,5 +406,3 @@ f_char: hello world
f_bool: true
day: 2021-09-18
```
[Original article](https://clickhouse.com/docs/en/engines/table-engines/integrations/hive/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 40
sidebar_label: Integrations
toc_folder_title: Integrations
toc_priority: 1
---
# Table Engines for Integrations {#table-engines-for-integrations}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 3
sidebar_label: JDBC
toc_priority: 3
toc_title: JDBC
---
# JDBC {#table-engine-jdbc}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 8
sidebar_label: Kafka
toc_priority: 8
toc_title: Kafka
---
# Kafka {#kafka}
@ -87,9 +87,8 @@ Examples:
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects. If possible, switch old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects. If possible, switch old projects to the method described above.
``` sql
Kafka(kafka_broker_list, kafka_topic_list, kafka_group_name, kafka_format
@ -134,7 +133,7 @@ Example:
SELECT level, sum(total) FROM daily GROUP BY level;
```
To improve performance, received messages are grouped into blocks the size of [max_insert_block_size](../../../operations/settings/settings.md#settings-max_insert_block_size). If the block wasnt formed within [stream_flush_interval_ms](../../../operations/settings/settings.md/#stream-flush-interval-ms) milliseconds, the data will be flushed to the table regardless of the completeness of the block.
To improve performance, received messages are grouped into blocks the size of [max_insert_block_size](../../../operations/settings/settings/#settings-max_insert_block_size). If the block wasnt formed within [stream_flush_interval_ms](../../../operations/settings/settings/#stream-flush-interval-ms) milliseconds, the data will be flushed to the table regardless of the completeness of the block.
To stop receiving topic data or to change the conversion logic, detach the materialized view:

View File

@ -1,6 +1,6 @@
---
sidebar_position: 12
sidebar_label: MaterializedPostgreSQL
toc_priority: 12
toc_title: MaterializedPostgreSQL
---
# MaterializedPostgreSQL {#materialize-postgresql}
@ -52,8 +52,5 @@ PRIMARY KEY key;
SELECT key, value, _version FROM postgresql_db.postgresql_replica;
```
:::warning
Replication of [**TOAST**](https://www.postgresql.org/docs/9.5/storage-toast.html) values is not supported. The default value for the data type will be used.
:::
[Original article](https://clickhouse.com/docs/en/engines/table-engines/integrations/materialized-postgresql) <!--hide-->
!!! warning "Warning"
Replication of [**TOAST**](https://www.postgresql.org/docs/9.5/storage-toast.html) values is not supported. The default value for the data type will be used.

View File

@ -1,6 +1,6 @@
---
sidebar_position: 5
sidebar_label: MongoDB
toc_priority: 5
toc_title: MongoDB
---
# MongoDB {#mongodb}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 4
sidebar_label: MySQL
toc_priority: 4
toc_title: MySQL
---
# MySQL {#mysql}
@ -148,5 +148,3 @@ Default value: `16`.
- [The mysql table function](../../../sql-reference/table-functions/mysql.md)
- [Using MySQL as a source of external dictionary](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md#dicts-external_dicts_dict_sources-mysql)
[Original article](https://clickhouse.com/docs/en/engines/table-engines/integrations/mysql/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 2
sidebar_label: ODBC
toc_priority: 2
toc_title: ODBC
---
# ODBC {#table-engine-odbc}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 11
sidebar_label: PostgreSQL
toc_priority: 11
toc_title: PostgreSQL
---
# PostgreSQL {#postgresql}
@ -73,9 +73,8 @@ All joins, aggregations, sorting, `IN [ array ]` conditions and the `LIMIT` samp
PostgreSQL `Array` types are converted into ClickHouse arrays.
:::warning
Be careful - in PostgreSQL an array data, created like a `type_name[]`, may contain multi-dimensional arrays of different dimensions in different table rows in same column. But in ClickHouse it is only allowed to have multidimensional arrays of the same count of dimensions in all table rows in same column.
:::
!!! info "Note"
Be careful - in PostgreSQL an array data, created like a `type_name[]`, may contain multi-dimensional arrays of different dimensions in different table rows in same column. But in ClickHouse it is only allowed to have multidimensional arrays of the same count of dimensions in all table rows in same column.
Supports multiple replicas that must be listed by `|`. For example:

View File

@ -1,6 +1,6 @@
---
sidebar_position: 10
sidebar_label: RabbitMQ
toc_priority: 10
toc_title: RabbitMQ
---
# RabbitMQ Engine {#rabbitmq-engine}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 7
sidebar_label: S3
toc_priority: 7
toc_title: S3
---
# S3 Table Engine {#table-engine-s3}
@ -66,9 +66,8 @@ For more information about virtual columns see [here](../../../engines/table-eng
Constructions with `{}` are similar to the [remote](../../../sql-reference/table-functions/remote.md) table function.
:::warning
If the listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
:::
!!! warning "Warning"
If the listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
**Example with wildcards 1**
@ -159,5 +158,3 @@ The following settings can be specified in configuration file for given endpoint
## See also
- [s3 table function](../../../sql-reference/table-functions/s3.md)
[Original article](https://clickhouse.com/docs/en/engines/table-engines/integrations/s3/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 7
sidebar_label: SQLite
toc_priority: 7
toc_title: SQLite
---
# SQLite {#sqlite}
@ -56,7 +56,4 @@ SELECT * FROM sqlite_db.table2 ORDER BY col1;
**See Also**
- [SQLite](../../../engines/database-engines/sqlite.md) engine
- [sqlite](../../../sql-reference/table-functions/sqlite.md) table function
[Original article](https://clickhouse.com/docs/en/engines/table-engines/integrations/sqlite/) <!--hide-->
- [sqlite](../../../sql-reference/table-functions/sqlite.md) table function

View File

@ -1,6 +1,7 @@
---
sidebar_position: 20
sidebar_label: Log Family
toc_folder_title: Log Family
toc_priority: 29
toc_title: Introduction
---
# Log Engine Family {#log-engine-family}

View File

@ -10,6 +10,3 @@ The engine belongs to the family of `Log` engines. See the common properties of
`Log` differs from [TinyLog](../../../engines/table-engines/log-family/tinylog.md) in that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads.
For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other.
The `Log` engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The `Log` engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes.
[Original article](https://clickhouse.com/docs/en/engines/table-engines/log-family/log/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 60
sidebar_label: AggregatingMergeTree
toc_priority: 35
toc_title: AggregatingMergeTree
---
# AggregatingMergeTree {#aggregatingmergetree}
@ -42,9 +42,8 @@ When creating a `AggregatingMergeTree` table the same [clauses](../../../engines
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]

View File

@ -1,6 +1,6 @@
---
sidebar_position: 70
sidebar_label: CollapsingMergeTree
toc_priority: 36
toc_title: CollapsingMergeTree
---
# CollapsingMergeTree {#table_engine-collapsingmergetree}
@ -42,9 +42,8 @@ When creating a `CollapsingMergeTree` table, the same [query clauses](../../../e
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects and, if possible, switch old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]

View File

@ -1,15 +1,12 @@
---
sidebar_position: 30
sidebar_label: Custom Partitioning Key
toc_priority: 32
toc_title: Custom Partitioning Key
---
# Custom Partitioning Key {#custom-partitioning-key}
:::warning
In most cases you do not need a partition key, and in most other cases you do not need a partition key more granular than by months. Partitioning does not speed up queries (in contrast to the ORDER BY expression).
You should never use too granular of partitioning. Don't partition your data by client identifiers or names. Instead, make a client identifier or name the first column in the ORDER BY expression.
:::
!!! warning "Warning"
In most cases you don't need partition key, and in most other cases you don't need partition key more granular than by months. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead make client identifier or name the first column in the ORDER BY expression).
Partitioning is available for the [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md) family tables (including [replicated](../../../engines/table-engines/mergetree-family/replication.md) tables). [Materialized views](../../../engines/table-engines/special/materializedview.md#materializedview) based on MergeTree tables support partitioning, as well.
@ -43,9 +40,8 @@ By default, the floating-point partition key is not supported. To use it enable
When inserting new data to a table, this data is stored as a separate part (chunk) sorted by the primary key. In 10-15 minutes after inserting, the parts of the same partition are merged into the entire part.
:::info
A merge only works for data parts that have the same value for the partitioning expression. This means **you shouldnt make overly granular partitions** (more than about a thousand partitions). Otherwise, the `SELECT` query performs poorly because of an unreasonably large number of files in the file system and open file descriptors.
:::
!!! info "Info"
A merge only works for data parts that have the same value for the partitioning expression. This means **you shouldnt make overly granular partitions** (more than about a thousand partitions). Otherwise, the `SELECT` query performs poorly because of an unreasonably large number of files in the file system and open file descriptors.
Use the [system.parts](../../../operations/system-tables/parts.md#system_tables-parts) table to view the table parts and partitions. For example, lets assume that we have a `visits` table with partitioning by month. Lets perform the `SELECT` query for the `system.parts` table:
@ -82,9 +78,8 @@ Lets break down the name of the part: `201901_1_9_2_11`:
- `2` is the chunk level (the depth of the merge tree it is formed from).
- `11` is the mutation version (if a part mutated)
:::info
The parts of old-type tables have the name: `20190117_20190123_2_2_0` (minimum date - maximum date - minimum block number - maximum block number - level).
:::
!!! info "Info"
The parts of old-type tables have the name: `20190117_20190123_2_2_0` (minimum date - maximum date - minimum block number - maximum block number - level).
The `active` column shows the status of the part. `1` is active; `0` is inactive. The inactive parts are, for example, source parts remaining after merging to a larger part. The corrupted data parts are also indicated as inactive.

View File

@ -1,6 +1,6 @@
---
sidebar_position: 90
sidebar_label: GraphiteMergeTree
toc_priority: 38
toc_title: GraphiteMergeTree
---
# GraphiteMergeTree {#graphitemergetree}
@ -54,9 +54,8 @@ When creating a `GraphiteMergeTree` table, the same [clauses](../../../engines/t
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects and, if possible, switch old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
@ -120,13 +119,12 @@ default
...
```
:::warning
Patterns must be strictly ordered:
!!! warning "Attention"
Patterns must be strictly ordered:
1. Patterns without `function` or `retention`.
1. Patterns with both `function` and `retention`.
1. Pattern `default`.
:::
1. Patterns without `function` or `retention`.
1. Patterns with both `function` and `retention`.
1. Pattern `default`.
When processing a row, ClickHouse checks the rules in the `pattern` sections. Each of `pattern` (including `default`) sections can contain `function` parameter for aggregation, `retention` parameters or both. If the metric name matches the `regexp`, the rules from the `pattern` section (or sections) are applied; otherwise, the rules from the `default` section are used.
@ -255,6 +253,7 @@ Valid values:
```
:::warning
Data rollup is performed during merges. Usually, for old partitions, merges are not started, so for rollup it is necessary to trigger an unscheduled merge using [optimize](../../../sql-reference/statements/optimize.md). Or use additional tools, for example [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer).
:::
!!! warning "Warning"
Data rollup is performed during merges. Usually, for old partitions, merges are not started, so for rollup it is necessary to trigger an unscheduled merge using [optimize](../../../sql-reference/statements/optimize.md). Or use additional tools, for example [graphite-ch-optimizer](https://github.com/innogames/graphite-ch-optimizer).
[Original article](https://clickhouse.com/docs/en/operations/table_engines/graphitemergetree/) <!--hide-->

View File

@ -1,6 +1,7 @@
---
sidebar_position: 10
sidebar_label: MergeTree Family
toc_folder_title: MergeTree Family
toc_priority: 28
toc_title: Introduction
---
# MergeTree Engine Family {#mergetree-engine-family}

View File

@ -1,6 +1,6 @@
---
sidebar_position: 11
sidebar_label: MergeTree
toc_priority: 30
toc_title: MergeTree
---
# MergeTree {#table_engines-mergetree}
@ -27,9 +27,8 @@ Main features:
If necessary, you can set the data sampling method in the table.
:::info
The [Merge](../../../engines/table-engines/special/merge.md#merge) engine does not belong to the `*MergeTree` family.
:::
!!! info "Info"
The [Merge](../../../engines/table-engines/special/merge.md#merge) engine does not belong to the `*MergeTree` family.
## Creating a Table {#table_engine-mergetree-creating-a-table}
@ -128,9 +127,8 @@ The `index_granularity` setting can be omitted because 8192 is the default value
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects. If possible, switch old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects. If possible, switch old projects to the method described above.
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
@ -306,8 +304,8 @@ CREATE TABLE table_name
Indices from the example can be used by ClickHouse to reduce the amount of data to read from disk in the following queries:
``` sql
SELECT count() FROM table WHERE s &lt; 'z'
SELECT count() FROM table WHERE u64 * i32 == 10 AND u64 * length(s) &gt;= 1234
SELECT count() FROM table WHERE s < 'z'
SELECT count() FROM table WHERE u64 * i32 == 10 AND u64 * length(s) >= 1234
```
#### Available Types of Indices {#available-types-of-indices}
@ -366,7 +364,7 @@ The `set` index can be used with all functions. Function subsets for other index
| Function (operator) / Index | primary key | minmax | ngrambf_v1 | tokenbf_v1 | bloom_filter |
|------------------------------------------------------------------------------------------------------------|-------------|--------|-------------|-------------|---------------|
| [equals (=, ==)](../../../sql-reference/functions/comparison-functions.md#function-equals) | ✔ | ✔ | ✔ | ✔ | ✔ |
| [notEquals(!=, &lt;&gt;)](../../../sql-reference/functions/comparison-functions.md#function-notequals) | ✔ | ✔ | ✔ | ✔ | ✔ |
| [notEquals(!=, <>)](../../../sql-reference/functions/comparison-functions.md#function-notequals) | ✔ | ✔ | ✔ | ✔ | ✔ |
| [like](../../../sql-reference/functions/string-search-functions.md#function-like) | ✔ | ✔ | ✔ | ✔ | ✗ |
| [notLike](../../../sql-reference/functions/string-search-functions.md#function-notlike) | ✔ | ✔ | ✔ | ✔ | ✗ |
| [startsWith](../../../sql-reference/functions/string-functions.md#startswith) | ✔ | ✔ | ✔ | ✔ | ✗ |
@ -384,10 +382,8 @@ The `set` index can be used with all functions. Function subsets for other index
Functions with a constant argument that is less than ngram size cant be used by `ngrambf_v1` for query optimization.
:::note
Bloom filters can have false positive matches, so the `ngrambf_v1`, `tokenbf_v1`, and `bloom_filter` indexes can not be used for optimizing queries where the result of a function is expected to be false.
For example:
!!! note "Note"
Bloom filters can have false positive matches, so the `ngrambf_v1`, `tokenbf_v1`, and `bloom_filter` indexes cant be used for optimizing queries where the result of a function is expected to be false, for example:
- Can be optimized:
- `s LIKE '%test%'`
@ -395,13 +391,12 @@ For example:
- `s = 1`
- `NOT s != 1`
- `startsWith(s, 'test')`
- Can not be optimized:
- Cant be optimized:
- `NOT s LIKE '%test%'`
- `s NOT LIKE '%test%'`
- `NOT s = 1`
- `s != 1`
- `NOT startsWith(s, 'test')`
:::
## Projections {#projections}
Projections are like [materialized views](../../../sql-reference/statements/create/view.md#materialized) but defined in part-level. It provides consistency guarantees along with automatic usage in queries.

View File

@ -1,6 +1,6 @@
---
sidebar_position: 40
sidebar_label: ReplacingMergeTree
toc_priority: 33
toc_title: ReplacingMergeTree
---
# ReplacingMergeTree {#replacingmergetree}
@ -29,9 +29,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
For a description of request parameters, see [statement description](../../../sql-reference/statements/create/table.md).
:::warning
Uniqueness of rows is determined by the `ORDER BY` table section, not `PRIMARY KEY`.
:::
!!! note "Attention"
Uniqueness of rows is determined by the `ORDER BY` table section, not `PRIMARY KEY`.
**ReplacingMergeTree Parameters**
@ -50,9 +49,8 @@ When creating a `ReplacingMergeTree` table the same [clauses](../../../engines/t
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects and, if possible, switch old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]

View File

@ -1,6 +1,6 @@
---
sidebar_position: 20
sidebar_label: Data Replication
toc_priority: 31
toc_title: Data Replication
---
# Data Replication {#table_engines-replication}
@ -31,9 +31,8 @@ ClickHouse uses [Apache ZooKeeper](https://zookeeper.apache.org) for storing rep
To use replication, set parameters in the [zookeeper](../../../operations/server-configuration-parameters/settings.md#server-settings_zookeeper) server configuration section.
:::warning
Dont neglect the security setting. ClickHouse supports the `digest` [ACL scheme](https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_ZooKeeperAccessControl) of the ZooKeeper security subsystem.
:::
!!! attention "Attention"
Dont neglect the security setting. ClickHouse supports the `digest` [ACL scheme](https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_ZooKeeperAccessControl) of the ZooKeeper security subsystem.
Example of setting the addresses of the ZooKeeper cluster:

View File

@ -1,6 +1,6 @@
---
sidebar_position: 50
sidebar_label: SummingMergeTree
toc_priority: 34
toc_title: SummingMergeTree
---
# SummingMergeTree {#summingmergetree}
@ -41,9 +41,8 @@ When creating a `SummingMergeTree` table the same [clauses](../../../engines/tab
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects and, if possible, switch the old projects to the method described above.
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]

View File

@ -1,6 +1,6 @@
---
sidebar_position: 80
sidebar_label: VersionedCollapsingMergeTree
toc_priority: 37
toc_title: VersionedCollapsingMergeTree
---
# VersionedCollapsingMergeTree {#versionedcollapsingmergetree}
@ -53,9 +53,8 @@ When creating a `VersionedCollapsingMergeTree` table, the same [clauses](../../.
<summary>Deprecated Method for Creating a Table</summary>
:::warning
Do not use this method in new projects. If possible, switch old projects to the method described above.
:::
!!! attention "Attention"
Do not use this method in new projects. If possible, switch the old projects to the method described above.
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]

View File

@ -1,6 +1,6 @@
---
sidebar_position: 120
sidebar_label: Buffer
toc_priority: 45
toc_title: Buffer
---
# Buffer Table Engine {#buffer}
@ -54,9 +54,8 @@ If the set of columns in the Buffer table does not match the set of columns in a
If the types do not match for one of the columns in the Buffer table and a subordinate table, an error message is entered in the server log, and the buffer is cleared.
The same thing happens if the subordinate table does not exist when the buffer is flushed.
:::warning
Running ALTER on the Buffer table in releases made before 26 Oct 2021 will cause a `Block structure mismatch` error (see [#15117](https://github.com/ClickHouse/ClickHouse/issues/15117) and [#30565](https://github.com/ClickHouse/ClickHouse/pull/30565)), so deleting the Buffer table and then recreating is the only option. It is advisable to check that this error is fixed in your release before trying to run ALTER on the Buffer table.
:::
!!! attention "Attention"
Running ALTER on the Buffer table in releases made before 26 Oct 2021 will cause a `Block structure mismatch` error (see [#15117](https://github.com/ClickHouse/ClickHouse/issues/15117) and [#30565](https://github.com/ClickHouse/ClickHouse/pull/30565)), so deleting the Buffer table and then recreating is the only option. It is advisable to check that this error is fixed in your release before trying to run ALTER on the Buffer table.
If the server is restarted abnormally, the data in the buffer is lost.
@ -74,4 +73,4 @@ A Buffer table is used when too many INSERTs are received from a large number of
Note that it does not make sense to insert data one row at a time, even for Buffer tables. This will only produce a speed of a few thousand rows per second, while inserting larger blocks of data can produce over a million rows per second (see the section “Performance”).
[Original article](https://clickhouse.com/docs/en/engines/table-engines/special/buffer/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/buffer/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 20
sidebar_label: Dictionary
toc_priority: 35
toc_title: Dictionary
---
# Dictionary Table Engine {#dictionary}
@ -97,5 +97,3 @@ select * from products limit 1;
**See Also**
- [Dictionary function](../../../sql-reference/table-functions/dictionary.md#dictionary-function)
[Original article](https://clickhouse.com/docs/en/engines/table-engines/special/dictionary/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 10
sidebar_label: Distributed
toc_priority: 33
toc_title: Distributed
---
# Distributed Table Engine {#distributed}
@ -64,19 +64,19 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
- `monitor_max_sleep_time_ms` - same as [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
:::note
**Durability settings** (`fsync_...`):
!!! note "Note"
- Affect only asynchronous INSERTs (i.e. `insert_distributed_sync=false`) when data first stored on the initiator node disk and later asynchronously send to shards.
- May significantly decrease the inserts' performance
- Affect writing the data stored inside Distributed table folder into the **node which accepted your insert**. If you need to have guarantees of writing data to underlying MergeTree tables - see durability settings (`...fsync...`) in `system.merge_tree_settings`
**Durability settings** (`fsync_...`):
For **Insert limit settings** (`..._insert`) see also:
- Affect only asynchronous INSERTs (i.e. `insert_distributed_sync=false`) when data first stored on the initiator node disk and later asynchronously send to shards.
- May significantly decrease the inserts' performance
- Affect writing the data stored inside Distributed table folder into the **node which accepted your insert**. If you need to have guarantees of writing data to underlying MergeTree tables - see durability settings (`...fsync...`) in `system.merge_tree_settings`
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) setting
- `bytes_to_throw_insert` handled before `bytes_to_delay_insert`, so you should not set it to the value less then `bytes_to_delay_insert`
:::
For **Insert limit settings** (`..._insert`) see also:
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) setting
- `bytes_to_throw_insert` handled before `bytes_to_delay_insert`, so you should not set it to the value less then `bytes_to_delay_insert`
**Example**
@ -215,9 +215,8 @@ To learn more about how distibuted `in` and `global in` queries are processed, r
- `_shard_num` — Contains the `shard_num` value from the table `system.clusters`. Type: [UInt32](../../../sql-reference/data-types/int-uint.md).
:::note
Since [remote](../../../sql-reference/table-functions/remote.md) and [cluster](../../../sql-reference/table-functions/cluster.md) table functions internally create temporary Distributed table, `_shard_num` is available there too.
:::
!!! note "Note"
Since [remote](../../../sql-reference/table-functions/remote.md) and [cluster](../../../sql-reference/table-functions/cluster.md) table functions internally create temporary Distributed table, `_shard_num` is available there too.
**See Also**
@ -226,4 +225,3 @@ Since [remote](../../../sql-reference/table-functions/remote.md) and [cluster](.
- [shardNum()](../../../sql-reference/functions/other-functions.md#shard-num) and [shardCount()](../../../sql-reference/functions/other-functions.md#shard-count) functions
[Original article](https://clickhouse.com/docs/en/engines/table-engines/special/distributed/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 130
sidebar_label: External Data
toc_priority: 45
toc_title: External Data
---
# External Data for Query Processing {#external-data-for-query-processing}
@ -63,3 +63,4 @@ $ curl -F 'passwd=@passwd.tsv;' 'http://localhost:8123/?query=SELECT+shell,+coun
For distributed query processing, the temporary tables are sent to all the remote servers.
[Original article](https://clickhouse.com/docs/en/operations/table_engines/external_data/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 40
sidebar_label: File
toc_priority: 37
toc_title: File
---
# File Table Engine {#table_engines-file}
@ -30,9 +30,8 @@ When creating table using `File(Format)` it creates empty subdirectory in that f
You may manually create this subfolder and file in server filesystem and then [ATTACH](../../../sql-reference/statements/attach.md) it to table information with matching name, so you can query data from that file.
:::warning
Be careful with this functionality, because ClickHouse does not keep track of external changes to such files. The result of simultaneous writes via ClickHouse and outside of ClickHouse is undefined.
:::
!!! warning "Warning"
Be careful with this functionality, because ClickHouse does not keep track of external changes to such files. The result of simultaneous writes via ClickHouse and outside of ClickHouse is undefined.
## Example {#example}
@ -86,4 +85,4 @@ $ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64
- Indices
- Replication
[Original article](https://clickhouse.com/docs/en/operations/table_engines/special/file/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/file/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 140
sidebar_label: GenerateRandom
toc_priority: 46
toc_title: GenerateRandom
---
# GenerateRandom Table Engine {#table_engines-generate}
@ -56,4 +56,4 @@ SELECT * FROM generate_engine_table LIMIT 3
- Indices
- Replication
[Original article](https://clickhouse.com/docs/en/engines/table-engines/special/generate/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/generate/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 50
sidebar_label: Special
toc_folder_title: Special
toc_priority: 31
---
# Special Table Engines {#special-table-engines}

View File

@ -1,15 +1,14 @@
---
sidebar_position: 70
sidebar_label: Join
toc_priority: 40
toc_title: Join
---
# Join Table Engine {#join}
Optional prepared data structure for usage in [JOIN](../../../sql-reference/statements/select/join.md#select-join) operations.
:::note
This is not an article about the [JOIN clause](../../../sql-reference/statements/select/join.md#select-join) itself.
:::
!!! note "Note"
This is not an article about the [JOIN clause](../../../sql-reference/statements/select/join.md#select-join) itself.
## Creating a Table {#creating-a-table}
@ -126,5 +125,3 @@ ALTER TABLE id_val_join DELETE WHERE id = 3;
│ 1 │ 21 │
└────┴─────┘
```
[Original article](https://clickhouse.com/docs/en/operations/table_engines/special/join/) <!--hide-->

View File

@ -1,10 +1,10 @@
---
sidebar_position: 100
sidebar_label: MaterializedView
toc_priority: 43
toc_title: MaterializedView
---
# MaterializedView Table Engine {#materializedview}
Used for implementing materialized views (for more information, see [CREATE VIEW](../../../sql-reference/statements/create/view.md#materialized)). For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses that engine.
[Original article](https://clickhouse.com/docs/en/engines/table-engines/special/materializedview/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/materializedview/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 110
sidebar_label: Memory
toc_priority: 44
toc_title: Memory
---
# Memory Table Engine {#memory}
@ -15,4 +15,4 @@ Normally, using this table engine is not justified. However, it can be used for
The Memory engine is used by the system for temporary tables with external query data (see the section “External data for processing a query”), and for implementing `GLOBAL IN` (see the section “IN operators”).
[Original article](https://clickhouse.com/docs/en/engines/table-engines/special/memory/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/memory/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 30
sidebar_label: Merge
toc_priority: 36
toc_title: Merge
---
# Merge Table Engine {#merge}
@ -12,7 +12,7 @@ Reading is automatically parallelized. Writing to a table is not supported. When
## Creating a Table {#creating-a-table}
``` sql
CREATE TABLE ... Engine=Merge(db_name, tables_regexp)
CREATE TABLE ... Engine=Merge(db_name, tables_regexp)
```
**Engine Parameters**
@ -81,5 +81,3 @@ SELECT * FROM WatchLog;
- [Virtual columns](../../../engines/table-engines/special/index.md#table_engines-virtual_columns)
- [merge](../../../sql-reference/table-functions/merge.md) table function
[Original article](https://clickhouse.com/docs/en/operations/table_engines/special/merge/) <!--hide-->

View File

@ -1,15 +1,13 @@
---
sidebar_position: 50
sidebar_label: 'Null'
toc_priority: 38
toc_title: 'Null'
---
# Null Table Engine {#null}
When writing to a `Null` table, data is ignored. When reading from a `Null` table, the response is empty.
:::note
If you are wondering why this is useful, note that you can create a materialized view on a `Null` table. So the data written to the table will end up affecting the view, but original raw data will still be discarded.
:::
!!! info "Hint"
However, you can create a materialized view on a `Null` table. So the data written to the table will end up affecting the view, but original raw data will still be discarded.
[Original article](https://clickhouse.com/docs/en/operations/table_engines/special/null/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/null/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 60
sidebar_label: Set
toc_priority: 39
toc_title: Set
---
# Set Table Engine {#set}
@ -20,4 +20,4 @@ When creating a table, the following settings are applied:
- [persistent](../../../operations/settings/settings.md#persistent)
[Original article](https://clickhouse.com/docs/en/operations/table_engines/special/set/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/set/) <!--hide-->

View File

@ -1,6 +1,6 @@
---
sidebar_position: 80
sidebar_label: URL
toc_priority: 41
toc_title: URL
---
# URL Table Engine {#table_engines-url}
@ -89,4 +89,4 @@ SELECT * FROM url_engine_table
- Indexes.
- Replication.
[Original article](https://clickhouse.com/docs/en/operations/table_engines/special/url/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/url/) <!--hide-->

View File

@ -1,10 +1,10 @@
---
sidebar_position: 90
sidebar_label: View
toc_priority: 42
toc_title: View
---
# View Table Engine {#table_engines-view}
Used for implementing views (for more information, see the `CREATE VIEW query`). It does not store data, but only stores the specified `SELECT` query. When reading from a table, it runs this query (and deletes all unnecessary columns from the query).
[Original article](https://clickhouse.com/docs/en/operations/table_engines/special/view/) <!--hide-->
[Original article](https://clickhouse.com/docs/en/operations/table_engines/view/) <!--hide-->

View File

@ -1,8 +0,0 @@
position: 10
label: 'Example Datasets'
collapsible: true
collapsed: true
link:
type: generated-index
title: Example Datasets
slug: /en/example-datasets

View File

@ -0,0 +1,25 @@
---
title: What is a columnar database?
toc_hidden: true
toc_priority: 101
---
# What Is a Columnar Database? {#what-is-a-columnar-database}
A columnar database stores data of each column independently. This allows to read data from disks only for those columns that are used in any given query. The cost is that operations that affect whole rows become proportionally more expensive. The synonym for a columnar database is a column-oriented database management system. ClickHouse is a typical example of such a system.
Key columnar database advantages are:
- Queries that use only a few columns out of many.
- Aggregating queries against large volumes of data.
- Column-wise data compression.
Here is the illustration of the difference between traditional row-oriented systems and columnar databases when building reports:
**Traditional row-oriented**
![Traditional row-oriented](https://clickhouse.com/docs/en/images/row-oriented.gif#)
**Columnar**
![Columnar](https://clickhouse.com/docs/en/images/column-oriented.gif#)
A columnar database is a preferred choice for analytical applications because it allows to have many columns in a table just in case, but do not pay the cost for unused columns on read query execution time. Column-oriented databases are designed for big data processing and data warehousing, because they often natively scale using distributed clusters of low-cost hardware to increase throughput. ClickHouse does it with combination of [distributed](../../engines/table-engines/special/distributed.md) and [replicated](../../engines/table-engines/mergetree-family/replication.md) tables.

View File

@ -0,0 +1,17 @@
---
title: "What does \u201CClickHouse\u201D mean?"
toc_hidden: true
toc_priority: 10
---
# What Does “ClickHouse” Mean? {#what-does-clickhouse-mean}
Its a combination of “**Click**stream” and “Data ware**House**”. It comes from the original use case at Yandex.Metrica, where ClickHouse was supposed to keep records of all clicks by people from all over the Internet, and it still does the job. You can read more about this use case on [ClickHouse history](../../introduction/history.md) page.
This two-part meaning has two consequences:
- The only correct way to write Click**H**ouse is with capital H.
- If you need to abbreviate it, use **CH**. For some historical reasons, abbreviating as CK is also popular in China, mostly because one of the first talks about ClickHouse in Chinese used this form.
!!! info "Fun fact"
Many years after ClickHouse got its name, this approach of combining two words that are meaningful on their own has been highlighted as the best way to name a database in a [research by Andy Pavlo](https://www.cs.cmu.edu/~pavlo/blog/2020/03/on-naming-a-database-management-system.html), an Associate Professor of Databases at Carnegie Mellon University. ClickHouse shared his “best database name of all time” award with Postgres.

View File

@ -0,0 +1,15 @@
---
title: How do I contribute code to ClickHouse?
toc_hidden: true
toc_priority: 120
---
# How do I contribute code to ClickHouse? {#how-do-i-contribute-code-to-clickhouse}
ClickHouse is an open-source project [developed on GitHub](https://github.com/ClickHouse/ClickHouse).
As customary, contribution instructions are published in [CONTRIBUTING.md](https://github.com/ClickHouse/ClickHouse/blob/master/CONTRIBUTING.md) file in the root of the source code repository.
If you want to suggest a substantial change to ClickHouse, consider [opening a GitHub issue](https://github.com/ClickHouse/ClickHouse/issues/new/choose) explaining what you want to do, to discuss it with maintainers and community first. [Examples of such RFC issues](https://github.com/ClickHouse/ClickHouse/issues?q=is%3Aissue+is%3Aopen+rfc).
If your contributions are security related, please check out [our security policy](https://github.com/ClickHouse/ClickHouse/security/policy/) too.

View File

@ -0,0 +1,25 @@
---
title: General questions about ClickHouse
toc_hidden_folder: true
toc_priority: 1
toc_title: General
---
# General Questions About ClickHouse {#general-questions}
Questions:
- [What is ClickHouse?](../../index.md#what-is-clickhouse)
- [Why ClickHouse is so fast?](../../faq/general/why-clickhouse-is-so-fast.md)
- [Who is using ClickHouse?](../../faq/general/who-is-using-clickhouse.md)
- [What does “ClickHouse” mean?](../../faq/general/dbms-naming.md)
- [What does “Не тормозит” mean?](../../faq/general/ne-tormozit.md)
- [What is OLAP?](../../faq/general/olap.md)
- [What is a columnar database?](../../faq/general/columnar-database.md)
- [Why not use something like MapReduce?](../../faq/general/mapreduce.md)
- [How do I contribute code to ClickHouse?](../../faq/general/how-do-i-contribute-code-to-clickhouse.md)
!!! info "Dont see what you were looking for?"
Check out [other F.A.Q. categories](../../faq/index.md) or browse around main documentation articles found in the left sidebar.
{## [Original article](https://clickhouse.com/docs/en/faq/general/) ##}

View File

@ -0,0 +1,13 @@
---
title: Why not use something like MapReduce?
toc_hidden: true
toc_priority: 110
---
# Why Not Use Something Like MapReduce? {#why-not-use-something-like-mapreduce}
We can refer to systems like MapReduce as distributed computing systems in which the reduce operation is based on distributed sorting. The most common open-source solution in this class is [Apache Hadoop](http://hadoop.apache.org). Large IT companies often have proprietary in-house solutions.
These systems arent appropriate for online queries due to their high latency. In other words, they cant be used as the back-end for a web interface. These types of systems arent useful for real-time data updates. Distributed sorting isnt the best way to perform reduce operations if the result of the operation and all the intermediate results (if there are any) are located in the RAM of a single server, which is usually the case for online queries. In such a case, a hash table is an optimal way to perform reduce operations. A common approach to optimizing map-reduce tasks is pre-aggregation (partial reduce) using a hash table in RAM. The user performs this optimization manually. Distributed sorting is one of the main causes of reduced performance when running simple map-reduce tasks.
Most MapReduce implementations allow you to execute arbitrary code on a cluster. But a declarative query language is better suited to OLAP to run experiments quickly. For example, Hadoop has Hive and Pig. Also consider Cloudera Impala or Shark (outdated) for Spark, as well as Spark SQL, Presto, and Apache Drill. Performance when running such tasks is highly sub-optimal compared to specialized systems, but relatively high latency makes it unrealistic to use these systems as the backend for a web interface.

View File

@ -0,0 +1,26 @@
---
title: "What does \u201C\u043D\u0435 \u0442\u043E\u0440\u043C\u043E\u0437\u0438\u0442\
\u201D mean?"
toc_hidden: true
toc_priority: 11
---
# What Does “Не тормозит” Mean? {#what-does-ne-tormozit-mean}
This question usually arises when people see official ClickHouse t-shirts. They have large words **“ClickHouse не тормозит”** on the front.
Before ClickHouse became open-source, it has been developed as an in-house storage system by the largest Russian IT company, Yandex. Thats why it initially got its slogan in Russian, which is “не тормозит” (pronounced as “ne tormozit”). After the open-source release we first produced some of those t-shirts for events in Russia and it was a no-brainer to use the slogan as-is.
One of the following batches of those t-shirts was supposed to be given away on events outside of Russia and we tried to make the English version of the slogan. Unfortunately, the Russian language is kind of elegant in terms of expressing stuff and there was a restriction of limited space on a t-shirt, so we failed to come up with good enough translation (most options appeared to be either long or inaccurate) and decided to keep the slogan in Russian even on t-shirts produced for international events. It appeared to be a great decision because people all over the world get positively surprised and curious when they see it.
So, what does it mean? Here are some ways to translate *“не тормозит”*:
- If you translate it literally, itd be something like *“ClickHouse does not press the brake pedal”*.
- If youd want to express it as close to how it sounds to a Russian person with IT background, itd be something like *“If your larger system lags, its not because it uses ClickHouse”*.
- Shorter, but not so precise versions could be *“ClickHouse is not slow”*, *“ClickHouse does not lag”* or just *“ClickHouse is fast”*.
If you havent seen one of those t-shirts in person, you can check them out online in many ClickHouse-related videos. For example, this one:
![iframe](https://www.youtube.com/embed/bSyQahMVZ7w)
P.S. These t-shirts are not for sale, they are given away for free on most [ClickHouse Meetups](https://clickhouse.com/#meet), usually for best questions or other forms of active participation.

View File

@ -0,0 +1,39 @@
---
title: What is OLAP?
toc_hidden: true
toc_priority: 100
---
# What Is OLAP? {#what-is-olap}
[OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business. But at the very high level, you can just read these words backward:
Processing
: Some source data is processed…
Analytical
: …to produce some analytical reports and insights…
Online
: …in real-time.
## OLAP from the Business Perspective {#olap-from-the-business-perspective}
In recent years, business people started to realize the value of data. Companies who make their decisions blindly, more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be remotely useful for making business decisions and need mechanisms to timely analyze them. Heres where OLAP database management systems (DBMS) come in.
In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI applications (Business Intelligence).
ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and an in-house data warehouse scenario is also viable.
## OLAP from the Technical Perspective {#olap-from-the-technical-perspective}
All database management systems could be classified into two groups: OLAP (Online **Analytical** Processing) and OLTP (Online **Transactional** Processing). Former focuses on building reports, each based on large volumes of historical data, but doing it not so frequently. While the latter usually handle a continuous stream of transactions, constantly modifying the current state of data.
In practice OLAP and OLTP are not categories, its more like a spectrum. Most real systems usually focus on one of them but provide some solutions or workarounds if the opposite kind of workload is also desired. This situation often forces businesses to operate multiple storage systems integrated, which might be not so big deal but having more systems make it more expensive to maintain. So the trend of recent years is HTAP (**Hybrid Transactional/Analytical Processing**) when both kinds of the workload are handled equally well by a single database management system.
Even if a DBMS started as a pure OLAP or pure OLTP, they are forced to move towards that HTAP direction to keep up with their competition. And ClickHouse is no exception, initially, it has been designed as [fast-as-possible OLAP system](../../faq/general/why-clickhouse-is-so-fast.md) and it still does not have full-fledged transaction support, but some features like consistent read/writes and mutations for updating/deleting data had to be added.
The fundamental trade-off between OLAP and OLTP systems remains:
- To build analytical reports efficiently its crucial to be able to read columns separately, thus most OLAP databases are [columnar](../../faq/general/columnar-database.md),
- While storing columns separately increases costs of operations on rows, like append or in-place modification, proportionally to the number of columns (which can be huge if the systems try to collect all details of an event just in case). Thus, most OLTP systems store data arranged by rows.

View File

@ -0,0 +1,19 @@
---
title: Who is using ClickHouse?
toc_hidden: true
toc_priority: 9
---
# Who Is Using ClickHouse? {#who-is-using-clickhouse}
Being an open-source product makes this question not so straightforward to answer. You do not have to tell anyone if you want to start using ClickHouse, you just go grab source code or pre-compiled packages. Theres no contract to sign and the [Apache 2.0 license](https://github.com/ClickHouse/ClickHouse/blob/master/LICENSE) allows for unconstrained software distribution.
Also, the technology stack is often in a grey zone of whats covered by an NDA. Some companies consider technologies they use as a competitive advantage even if they are open-source and do not allow employees to share any details publicly. Some see some PR risks and allow employees to share implementation details only with their PR department approval.
So how to tell who is using ClickHouse?
One way is to **ask around**. If its not in writing, people are much more willing to share what technologies are used in their companies, what the use cases are, what kind of hardware is used, data volumes, etc. Were talking with users regularly on [ClickHouse Meetups](https://www.youtube.com/channel/UChtmrD-dsdpspr42P_PyRAw/playlists) all over the world and have heard stories about 1000+ companies that use ClickHouse. Unfortunately, thats not reproducible and we try to treat such stories as if they were told under NDA to avoid any potential troubles. But you can come to any of our future meetups and talk with other users on your own. There are multiple ways how meetups are announced, for example, you can subscribe to [our Twitter](http://twitter.com/ClickHouseDB/).
The second way is to look for companies **publicly saying** that they use ClickHouse. Its more substantial because theres usually some hard evidence like a blog post, talk video recording, slide deck, etc. We collect the collection of links to such evidence on our **[Adopters](../../introduction/adopters.md)** page. Feel free to contribute the story of your employer or just some links youve stumbled upon (but try not to violate your NDA in the process).
You can find names of very large companies in the adopters list, like Bloomberg, Cisco, China Telecom, Tencent, or Uber, but with the first approach, we found that there are many more. For example, if you take [the list of largest IT companies by Forbes (2020)](https://www.forbes.com/sites/hanktucker/2020/05/13/worlds-largest-technology-companies-2020-apple-stays-on-top-zoom-and-uber-debut/) over half of them are using ClickHouse in some way. Also, it would be unfair not to mention [Yandex](../../introduction/history.md), the company which initially open-sourced ClickHouse in 2016 and happens to be one of the largest IT companies in Europe.

View File

@ -0,0 +1,63 @@
---
title: Why ClickHouse is so fast?
toc_hidden: true
toc_priority: 8
---
# Why ClickHouse Is So Fast? {#why-clickhouse-is-so-fast}
It was designed to be fast. Query execution performance has always been a top priority during the development process, but other important characteristics like user-friendliness, scalability, and security were also considered so ClickHouse could become a real production system.
ClickHouse was initially built as a prototype to do just a single task well: to filter and aggregate data as fast as possible. Thats what needs to be done to build a typical analytical report and thats what a typical [GROUP BY](../../sql-reference/statements/select/group-by.md) query does. ClickHouse team has made several high-level decisions that combined made achieving this task possible:
Column-oriented storage
: Source data often contain hundreds or even thousands of columns, while a report can use just a few of them. The system needs to avoid reading unnecessary columns, or most expensive disk read operations would be wasted.
Indexes
: ClickHouse keeps data structures in memory that allows reading not only used columns but only necessary row ranges of those columns.
Data compression
: Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data column often has the same or not so many different values for neighboring rows. In addition to general-purpose compression, ClickHouse supports [specialized codecs](../../sql-reference/statements/create/table.md#create-query-specialized-codecs) that can make data even more compact.
Vectorized query execution
: ClickHouse not only stores data in columns but also processes data in columns. It leads to better CPU cache utilization and allows for [SIMD](https://en.wikipedia.org/wiki/SIMD) CPU instructions usage.
Scalability
: ClickHouse can leverage all available CPU cores and disks to execute even a single query. Not only on a single server but all CPU cores and disks of a cluster as well.
But many other database management systems use similar techniques. What really makes ClickHouse stand out is **attention to low-level details**. Most programming languages provide implementations for most common algorithms and data structures, but they tend to be too generic to be effective. Every task can be considered as a landscape with various characteristics, instead of just throwing in random implementation. For example, if you need a hash table, here are some key questions to consider:
- Which hash function to choose?
- Collision resolution algorithm: [open addressing](https://en.wikipedia.org/wiki/Open_addressing) vs [chaining](https://en.wikipedia.org/wiki/Hash_table#Separate_chaining)?
- Memory layout: one array for keys and values or separate arrays? Will it store small or large values?
- Fill factor: when and how to resize? How to move values around on resize?
- Will values be removed and which algorithm will work better if they will?
- Will we need fast probing with bitmaps, inline placement of string keys, support for non-movable values, prefetch, and batching?
Hash table is a key data structure for `GROUP BY` implementation and ClickHouse automatically chooses one of [30+ variations](https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/Aggregator.h) for each specific query.
The same goes for algorithms, for example, in sorting you might consider:
- What will be sorted: an array of numbers, tuples, strings, or structures?
- Is all data available completely in RAM?
- Do we need a stable sort?
- Do we need a full sort? Maybe partial sort or n-th element will suffice?
- How to implement comparisons?
- Are we sorting data that has already been partially sorted?
Algorithms that they rely on characteristics of data they are working with can often do better than their generic counterparts. If it is not really known in advance, the system can try various implementations and choose the one that works best in runtime. For example, see an [article on how LZ4 decompression is implemented in ClickHouse](https://habr.com/en/company/yandex/blog/457612/).
Last but not least, the ClickHouse team always monitors the Internet on people claiming that they came up with the best implementation, algorithm, or data structure to do something and tries it out. Those claims mostly appear to be false, but from time to time youll indeed find a gem.
!!! info "Tips for building your own high-performance software"
- Keep in mind low-level details when designing your system.
- Design based on hardware capabilities.
- Choose data structures and abstractions based on the needs of the task.
- Provide specializations for special cases.
- Try new, “best” algorithms, that you read about yesterday.
- Choose an algorithm in runtime based on statistics.
- Benchmark on real datasets.
- Test for performance regressions in CI.
- Measure and observe everything.

47
docs/en/faq/index.md Normal file
View File

@ -0,0 +1,47 @@
---
toc_folder_title: F.A.Q.
toc_hidden: true
toc_priority: 76
---
# ClickHouse F.A.Q {#clickhouse-f-a-q}
This section of the documentation is a place to collect answers to ClickHouse-related questions that arise often.
Categories:
- **[General](../faq/general/index.md)**
- [What is ClickHouse?](../index.md#what-is-clickhouse)
- [Why ClickHouse is so fast?](../faq/general/why-clickhouse-is-so-fast.md)
- [Who is using ClickHouse?](../faq/general/who-is-using-clickhouse.md)
- [What does “ClickHouse” mean?](../faq/general/dbms-naming.md)
- [What does “Не тормозит” mean?](../faq/general/ne-tormozit.md)
- [What is OLAP?](../faq/general/olap.md)
- [What is a columnar database?](../faq/general/columnar-database.md)
- [Why not use something like MapReduce?](../faq/general/mapreduce.md)
- **[Use Cases](../faq/use-cases/index.md)**
- [Can I use ClickHouse as a time-series database?](../faq/use-cases/time-series.md)
- [Can I use ClickHouse as a key-value storage?](../faq/use-cases/key-value.md)
- **[Operations](../faq/operations/index.md)**
- [Which ClickHouse version to use in production?](../faq/operations/production.md)
- [Is it possible to delete old records from a ClickHouse table?](../faq/operations/delete-old-data.md)
- [Does ClickHouse support multi-region replication?](../faq/operations/multi-region-replication.md)
- **[Integration](../faq/integration/index.md)**
- [How do I export data from ClickHouse to a file?](../faq/integration/file-export.md)
- [What if I have a problem with encodings when connecting to Oracle via ODBC?](../faq/integration/oracle-odbc.md)
{## TODO
Question candidates:
- How to choose a primary key?
- How to add a column in ClickHouse?
- Too many parts
- How to filter ClickHouse table by an array column contents?
- How to insert all rows from one table to another of identical structure?
- How to kill a process (query) in ClickHouse?
- How to implement pivot (like in pandas)?
- How to remove the default ClickHouse user through users.d?
- Importing MySQL dump to ClickHouse
- Window function workarounds (row_number, lag/lead, running diff/sum/average)
##}
{## [Original article](https://clickhouse.com/docs/en/faq) ##}

View File

@ -0,0 +1,37 @@
---
title: How do I export data from ClickHouse to a file?
toc_hidden: true
toc_priority: 10
---
# How Do I Export Data from ClickHouse to a File? {#how-to-export-to-file}
## Using INTO OUTFILE Clause {#using-into-outfile-clause}
Add an [INTO OUTFILE](../../sql-reference/statements/select/into-outfile.md#into-outfile-clause) clause to your query.
For example:
``` sql
SELECT * FROM table INTO OUTFILE 'file'
```
By default, ClickHouse uses the [TabSeparated](../../interfaces/formats.md#tabseparated) format for output data. To select the [data format](../../interfaces/formats.md), use the [FORMAT clause](../../sql-reference/statements/select/format.md#format-clause).
For example:
``` sql
SELECT * FROM table INTO OUTFILE 'file' FORMAT CSV
```
## Using a File-Engine Table {#using-a-file-engine-table}
See [File](../../engines/table-engines/special/file.md) table engine.
## Using Command-Line Redirection {#using-command-line-redirection}
``` bash
$ clickhouse-client --query "SELECT * from table" --format FormatName > result.txt
```
See [clickhouse-client](../../interfaces/cli.md).

View File

@ -0,0 +1,19 @@
---
title: Questions about integrating ClickHouse and other systems
toc_hidden_folder: true
toc_priority: 4
toc_title: Integration
---
# Questions About Integrating ClickHouse and Other Systems {#question-about-integrating-clickhouse-and-other-systems}
Questions:
- [How do I export data from ClickHouse to a file?](../../faq/integration/file-export.md)
- [How to import JSON into ClickHouse?](../../faq/integration/json-import.md)
- [What if I have a problem with encodings when connecting to Oracle via ODBC?](../../faq/integration/oracle-odbc.md)
!!! info "Dont see what you were looking for?"
Check out [other F.A.Q. categories](../../faq/index.md) or browse around main documentation articles found in the left sidebar.
{## [Original article](https://clickhouse.com/docs/en/faq/integration/) ##}

View File

@ -0,0 +1,33 @@
---
title: How to import JSON into ClickHouse?
toc_hidden: true
toc_priority: 11
---
# How to Import JSON Into ClickHouse? {#how-to-import-json-into-clickhouse}
ClickHouse supports a wide range of [data formats for input and output](../../interfaces/formats.md). There are multiple JSON variations among them, but the most commonly used for data ingestion is [JSONEachRow](../../interfaces/formats.md#jsoneachrow). It expects one JSON object per row, each object separated by a newline.
## Examples {#examples}
Using [HTTP interface](../../interfaces/http.md):
``` bash
$ echo '{"foo":"bar"}' | curl 'http://localhost:8123/?query=INSERT%20INTO%20test%20FORMAT%20JSONEachRow' --data-binary @-
```
Using [CLI interface](../../interfaces/cli.md):
``` bash
$ echo '{"foo":"bar"}' | clickhouse-client --query="INSERT INTO test FORMAT JSONEachRow"
```
Instead of inserting data manually, you might consider to use one of [client libraries](../../interfaces/index.md) instead.
## Useful Settings {#useful-settings}
- `input_format_skip_unknown_fields` allows to insert JSON even if there were additional fields not present in table schema (by discarding them).
- `input_format_import_nested_json` allows to insert nested JSON objects into columns of [Nested](../../sql-reference/data-types/nested-data-structures/nested.md) type.
!!! note "Note"
Settings are specified as `GET` parameters for the HTTP interface or as additional command-line arguments prefixed with `--` for the `CLI` interface.

View File

@ -0,0 +1,15 @@
---
title: What if I have a problem with encodings when using Oracle via ODBC?
toc_hidden: true
toc_priority: 20
---
# What If I Have a Problem with Encodings When Using Oracle Via ODBC? {#oracle-odbc-encodings}
If you use Oracle as a source of ClickHouse external dictionaries via Oracle ODBC driver, you need to set the correct value for the `NLS_LANG` environment variable in `/etc/default/clickhouse`. For more information, see the [Oracle NLS_LANG FAQ](https://www.oracle.com/technetwork/products/globalization/nls-lang-099431.html).
**Example**
``` sql
NLS_LANG=RUSSIAN_RUSSIA.UTF8
```

View File

@ -0,0 +1,42 @@
---
title: Is it possible to delete old records from a ClickHouse table?
toc_hidden: true
toc_priority: 20
---
# Is It Possible to Delete Old Records from a ClickHouse Table? {#is-it-possible-to-delete-old-records-from-a-clickhouse-table}
The short answer is “yes”. ClickHouse has multiple mechanisms that allow freeing up disk space by removing old data. Each mechanism is aimed for different scenarios.
## TTL {#ttl}
ClickHouse allows to automatically drop values when some condition happens. This condition is configured as an expression based on any columns, usually just static offset for any timestamp column.
The key advantage of this approach is that it does not need any external system to trigger, once TTL is configured, data removal happens automatically in background.
!!! note "Note"
TTL can also be used to move data not only to [/dev/null](https://en.wikipedia.org/wiki/Null_device), but also between different storage systems, like from SSD to HDD.
More details on [configuring TTL](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-ttl).
## ALTER DELETE {#alter-delete}
ClickHouse does not have real-time point deletes like in [OLTP](https://en.wikipedia.org/wiki/Online_transaction_processing) databases. The closest thing to them are mutations. They are issued as `ALTER ... DELETE` or `ALTER ... UPDATE` queries to distinguish from normal `DELETE` or `UPDATE` as they are asynchronous batch operations, not immediate modifications. The rest of syntax after `ALTER TABLE` prefix is similar.
`ALTER DELETE` can be issued to flexibly remove old data. If you need to do it regularly, the main downside will be the need to have an external system to submit the query. There are also some performance considerations since mutation rewrite complete parts even theres only a single row to be deleted.
This is the most common approach to make your system based on ClickHouse [GDPR](https://gdpr-info.eu)-compliant.
More details on [mutations](../../sql-reference/statements/alter/index.md#alter-mutations).
## DROP PARTITION {#drop-partition}
`ALTER TABLE ... DROP PARTITION` provides a cost-efficient way to drop a whole partition. Its not that flexible and needs proper partitioning scheme configured on table creation, but still covers most common cases. Like mutations need to be executed from an external system for regular use.
More details on [manipulating partitions](../../sql-reference/statements/alter/partition.md#alter_drop-partition).
## TRUNCATE {#truncate}
Its rather radical to drop all data from a table, but in some cases it might be exactly what you need.
More details on [table truncation](../../sql-reference/statements/truncate.md).

View File

@ -0,0 +1,19 @@
---
title: Question about operating ClickHouse servers and clusters
toc_hidden_folder: true
toc_priority: 3
toc_title: Operations
---
# Question About Operating ClickHouse Servers and Clusters {#question-about-operating-clickhouse-servers-and-clusters}
Questions:
- [Which ClickHouse version to use in production?](../../faq/operations/production.md)
- [Is it possible to delete old records from a ClickHouse table?](../../faq/operations/delete-old-data.md)
- [Does ClickHouse support multi-region replication?](../../faq/operations/multi-region-replication.md)
!!! info "Dont see what you were looking for?"
Check out [other F.A.Q. categories](../../faq/index.md) or browse around main documentation articles found in the left sidebar.
{## [Original article](https://clickhouse.com/docs/en/faq/production/) ##}

View File

@ -0,0 +1,13 @@
---
title: Does ClickHouse support multi-region replication?
toc_hidden: true
toc_priority: 30
---
# Does ClickHouse support multi-region replication? {#does-clickhouse-support-multi-region-replication}
The short answer is "yes". However, we recommend keeping latency between all regions/datacenters in two-digit range, otherwise write performance will suffer as it goes through distributed consensus protocol. For example, replication between US coasts will likely work fine, but between the US and Europe won't.
Configuration-wise there's no difference compared to single-region replication, simply use hosts that are located in different locations for replicas.
For more information, see [full article on data replication](../../engines/table-engines/mergetree-family/replication.md).

View File

@ -0,0 +1,70 @@
---
title: Which ClickHouse version to use in production?
toc_hidden: true
toc_priority: 10
---
# Which ClickHouse Version to Use in Production? {#which-clickhouse-version-to-use-in-production}
First of all, lets discuss why people ask this question in the first place. There are two key reasons:
1. ClickHouse is developed with pretty high velocity and usually, there are 10+ stable releases per year. It makes a wide range of releases to choose from, which is not so trivial choice.
2. Some users want to avoid spending time figuring out which version works best for their use case and just follow someone elses advice.
The second reason is more fundamental, so well start with it and then get back to navigating through various ClickHouse releases.
## Which ClickHouse Version Do You Recommend? {#which-clickhouse-version-do-you-recommend}
Its tempting to hire consultants or trust some known experts to get rid of responsibility for your production environment. You install some specific ClickHouse version that someone else recommended, now if theres some issue with it - its not your fault, its someone elses. This line of reasoning is a big trap. No external person knows better whats going on in your companys production environment.
So how to properly choose which ClickHouse version to upgrade to? Or how to choose your first ClickHouse version? First of all, you need to invest in setting up a **realistic pre-production environment**. In an ideal world, it could be a completely identical shadow copy, but thats usually expensive.
Herere some key points to get reasonable fidelity in a pre-production environment with not so high costs:
- Pre-production environment needs to run an as close set of queries as you intend to run in production:
- Dont make it read-only with some frozen data.
- Dont make it write-only with just copying data without building some typical reports.
- Dont wipe it clean instead of applying schema migrations.
- Use a sample of real production data and queries. Try to choose a sample thats still representative and makes `SELECT` queries return reasonable results. Use obfuscation if your data is sensitive and internal policies do not allow it to leave the production environment.
- Make sure that pre-production is covered by your monitoring and alerting software the same way as your production environment does.
- If your production spans across multiple datacenters or regions, make your pre-production does the same.
- If your production uses complex features like replication, distributed table, cascading materialize views, make sure they are configured similarly in pre-production.
- Theres a trade-off on using the roughly same number of servers or VMs in pre-production as in production, but of smaller size, or much less of them, but of the same size. The first option might catch extra network-related issues, while the latter is easier to manage.
The second area to invest in is **automated testing infrastructure**. Dont assume that if some kind of query has executed successfully once, itll continue to do so forever. Its ok to have some unit tests where ClickHouse is mocked but make sure your product has a reasonable set of automated tests that are run against real ClickHouse and check that all important use cases are still working as expected.
Extra step forward could be contributing those automated tests to [ClickHouses open-source test infrastructure](https://github.com/ClickHouse/ClickHouse/tree/master/tests) thats continuously used in its day-to-day development. It definitely will take some additional time and effort to learn [how to run it](../../development/tests.md) and then how to adapt your tests to this framework, but itll pay off by ensuring that ClickHouse releases are already tested against them when they are announced stable, instead of repeatedly losing time on reporting the issue after the fact and then waiting for a bugfix to be implemented, backported and released. Some companies even have such test contributions to infrastructure by its use as an internal policy, most notably its called [Beyonces Rule](https://www.oreilly.com/library/view/software-engineering-at/9781492082781/ch01.html#policies_that_scale_well) at Google.
When you have your pre-production environment and testing infrastructure in place, choosing the best version is straightforward:
1. Routinely run your automated tests against new ClickHouse releases. You can do it even for ClickHouse releases that are marked as `testing`, but going forward to the next steps with them is not recommended.
2. Deploy the ClickHouse release that passed the tests to pre-production and check that all processes are running as expected.
3. Report any issues you discovered to [ClickHouse GitHub Issues](https://github.com/ClickHouse/ClickHouse/issues).
4. If there were no major issues, it should be safe to start deploying ClickHouse release to your production environment. Investing in gradual release automation that implements an approach similar to [canary releases](https://martinfowler.com/bliki/CanaryRelease.html) or [green-blue deployments](https://martinfowler.com/bliki/BlueGreenDeployment.html) might further reduce the risk of issues in production.
As you might have noticed, theres nothing specific to ClickHouse in the approach described above, people do that for any piece of infrastructure they rely on if they take their production environment seriously.
## How to Choose Between ClickHouse Releases? {#how-to-choose-between-clickhouse-releases}
If you look into contents of ClickHouse package repository, youll see four kinds of packages:
1. `testing`
2. `prestable`
3. `stable`
4. `lts` (long-term support)
As was mentioned earlier, `testing` is good mostly to notice issues early, running them in production is not recommended because each of them is not tested as thoroughly as other kinds of packages.
`prestable` is a release candidate which generally looks promising and is likely to become announced as `stable` soon. You can try them out in pre-production and report issues if you see any.
For production use, there are two key options: `stable` and `lts`. Here is some guidance on how to choose between them:
- `stable` is the kind of package we recommend by default. They are released roughly monthly (and thus provide new features with reasonable delay) and three latest stable releases are supported in terms of diagnostics and backporting of bugfixes.
- `lts` are released twice a year and are supported for a year after their initial release. You might prefer them over `stable` in the following cases:
- Your company has some internal policies that do not allow for frequent upgrades or using non-LTS software.
- You are using ClickHouse in some secondary products that either does not require any complex ClickHouse features and do not have enough resources to keep it updated.
Many teams who initially thought that `lts` is the way to go, often switch to `stable` anyway because of some recent feature thats important for their product.
!!! warning "Important"
One more thing to keep in mind when upgrading ClickHouse: were always keeping eye on compatibility across releases, but sometimes its not reasonable to keep and some minor details might change. So make sure you check the [changelog](../../whats-new/changelog/index.md) before upgrading to see if there are any notes about backward-incompatible changes.

View File

@ -0,0 +1,18 @@
---
title: Questions about ClickHouse use cases
toc_hidden_folder: true
toc_priority: 2
toc_title: Use Cases
---
# Questions About ClickHouse Use Cases {#questions-about-clickhouse-use-cases}
Questions:
- [Can I use ClickHouse as a time-series database?](../../faq/use-cases/time-series.md)
- [Can I use ClickHouse as a key-value storage?](../../faq/use-cases/key-value.md)
!!! info "Dont see what you were looking for?"
Check out [other F.A.Q. categories](../../faq/index.md) or browse around main documentation articles found in the left sidebar.
{## [Original article](https://clickhouse.com/docs/en/faq/use-cases/) ##}

View File

@ -0,0 +1,17 @@
---
title: Can I use ClickHouse as a key-value storage?
toc_hidden: true
toc_priority: 101
---
# Can I Use ClickHouse As a Key-Value Storage? {#can-i-use-clickhouse-as-a-key-value-storage}
The short answer is **“no”**. The key-value workload is among top positions in the list of cases when **NOT**{.text-danger} to use ClickHouse. Its an [OLAP](../../faq/general/olap.md) system after all, while there are many excellent key-value storage systems out there.
However, there might be situations where it still makes sense to use ClickHouse for key-value-like queries. Usually, its some low-budget products where the main workload is analytical in nature and fits ClickHouse well, but theres also some secondary process that needs a key-value pattern with not so high request throughput and without strict latency requirements. If you had an unlimited budget, you would have installed a secondary key-value database for thus secondary workload, but in reality, theres an additional cost of maintaining one more storage system (monitoring, backups, etc.) which might be desirable to avoid.
If you decide to go against recommendations and run some key-value-like queries against ClickHouse, herere some tips:
- The key reason why point queries are expensive in ClickHouse is its sparse primary index of main [MergeTree table engine family](../../engines/table-engines/mergetree-family/mergetree.md). This index cant point to each specific row of data, instead, it points to each N-th and the system has to scan from the neighboring N-th row to the desired one, reading excessive data along the way. In a key-value scenario, it might be useful to reduce the value of N with the `index_granularity` setting.
- ClickHouse keeps each column in a separate set of files, so to assemble one complete row it needs to go through each of those files. Their count increases linearly with the number of columns, so in the key-value scenario, it might be worth to avoid using many columns and put all your payload in a single `String` column encoded in some serialization format like JSON, Protobuf or whatever makes sense.
- Theres an alternative approach that uses [Join](../../engines/table-engines/special/join.md) table engine instead of normal `MergeTree` tables and [joinGet](../../sql-reference/functions/other-functions.md#joinget) function to retrieve the data. It can provide better query performance but might have some usability and reliability issues. Heres an [usage example](https://github.com/ClickHouse/ClickHouse/blob/master/tests/queries/0_stateless/00800_versatile_storage_join.sql#L49-L51).

View File

@ -0,0 +1,15 @@
---
title: Can I use ClickHouse as a time-series database?
toc_hidden: true
toc_priority: 101
---
# Can I Use ClickHouse As a Time-Series Database? {#can-i-use-clickhouse-as-a-time-series-database}
ClickHouse is a generic data storage solution for [OLAP](../../faq/general/olap.md) workloads, while there are many specialized time-series database management systems. Nevertheless, ClickHouses [focus on query execution speed](../../faq/general/why-clickhouse-is-so-fast.md) allows it to outperform specialized systems in many cases. There are many independent benchmarks on this topic out there, so were not going to conduct one here. Instead, lets focus on ClickHouse features that are important to use if thats your use case.
First of all, there are **[specialized codecs](../../sql-reference/statements/create/table.md#create-query-specialized-codecs)** which make typical time-series. Either common algorithms like `DoubleDelta` and `Gorilla` or specific to ClickHouse like `T64`.
Second, time-series queries often hit only recent data, like one day or one week old. It makes sense to use servers that have both fast nVME/SSD drives and high-capacity HDD drives. ClickHouse [TTL](../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) feature allows to configure keeping fresh hot data on fast drives and gradually move it to slower drives as it ages. Rollup or removal of even older data is also possible if your requirements demand it.
Even though its against ClickHouse philosophy of storing and processing raw data, you can use [materialized views](../../sql-reference/statements/create/view.md) to fit into even tighter latency or costs requirements.

View File

@ -1,6 +1,6 @@
---
sidebar_label: AMPLab Big Data Benchmark
description: A benchmark dataset used for comparing the performance of data warehousing solutions.
toc_priority: 19
toc_title: AMPLab Big Data Benchmark
---
# AMPLab Big Data Benchmark {#amplab-big-data-benchmark}

View File

@ -1,6 +1,6 @@
---
sidebar_label: Brown University Benchmark
description: A new analytical benchmark for machine-generated log data
toc_priority: 20
toc_title: Brown University Benchmark
---
# Brown University Benchmark

View File

@ -1,8 +1,9 @@
---
sidebar_label: Cell Towers
toc_priority: 21
toc_title: Cell Towers
---
# Cell Towers
# Cell Towers {#cell-towers}
This dataset is from [OpenCellid](https://www.opencellid.org/) - The world's largest Open Database of Cell Towers.
@ -95,7 +96,7 @@ SELECT mcc, count() FROM cell_towers GROUP BY mcc ORDER BY count() DESC LIMIT 10
So, the top countries are: the USA, Germany, and Russia.
You may want to create an [External Dictionary](../sql-reference/dictionaries/external-dictionaries/external-dicts.md) in ClickHouse to decode these values.
You may want to create an [External Dictionary](../../sql-reference/dictionaries/external-dictionaries/external-dicts.md) in ClickHouse to decode these values.
## Use case {#use-case}

View File

@ -1,8 +1,9 @@
---
sidebar_label: Terabyte Click Logs from Criteo
toc_priority: 18
toc_title: Terabyte Click Logs from Criteo
---
# Terabyte of Click Logs from Criteo
# Terabyte of Click Logs from Criteo {#terabyte-of-click-logs-from-criteo}
Download the data from http://labs.criteo.com/downloads/download-terabyte-click-logs/

View File

@ -1,5 +1,6 @@
---
sidebar_label: GitHub Events
toc_priority: 11
toc_title: GitHub Events
---
# GitHub Events Dataset

Some files were not shown because too many files have changed in this diff Show More