diff --git a/docs/en/commercial/cloud.md b/docs/en/commercial/cloud.md index 714f51d0b79..c6ed80d4fdb 100644 --- a/docs/en/commercial/cloud.md +++ b/docs/en/commercial/cloud.md @@ -3,60 +3,7 @@ toc_priority: 1 toc_title: Cloud --- -# ClickHouse Cloud Service Providers {#clickhouse-cloud-service-providers} +# ClickHouse Cloud Service {#clickhouse-cloud-service} !!! info "Info" - If you have launched a public cloud with managed ClickHouse service, feel free to [open a pull-request](https://github.com/ClickHouse/ClickHouse/edit/master/docs/en/commercial/cloud.md) adding it to the following list. - -## Yandex Cloud {#yandex-cloud} - -[Yandex Managed Service for ClickHouse](https://cloud.yandex.com/services/managed-clickhouse?utm_source=referrals&utm_medium=clickhouseofficialsite&utm_campaign=link3) provides the following key features: - -- Fully managed ZooKeeper service for [ClickHouse replication](../engines/table-engines/mergetree-family/replication.md) -- Multiple storage type choices -- Replicas in different availability zones -- Encryption and isolation -- Automated maintenance - -## Altinity.Cloud {#altinity.cloud} - -[Altinity.Cloud](https://altinity.com/cloud-database/) is a fully managed ClickHouse-as-a-Service for the Amazon public cloud. - -- Fast deployment of ClickHouse clusters on Amazon resources -- Easy scale-out/scale-in as well as vertical scaling of nodes -- Isolated per-tenant VPCs with public endpoint or VPC peering -- Configurable storage types and volume configurations -- Cross-AZ scaling for performance and high availability -- Built-in monitoring and SQL query editor - -## Alibaba Cloud {#alibaba-cloud} - -[Alibaba Cloud Managed Service for ClickHouse](https://www.alibabacloud.com/product/clickhouse) provides the following key features: - -- Highly reliable cloud disk storage engine based on [Alibaba Cloud Apsara](https://www.alibabacloud.com/product/apsara-stack) distributed system -- Expand capacity on demand without manual data migration -- Support single-node, single-replica, multi-node, and multi-replica architectures, and support hot and cold data tiering -- Support access allow-list, one-key recovery, multi-layer network security protection, cloud disk encryption -- Seamless integration with cloud log systems, databases, and data application tools -- Built-in monitoring and database management platform -- Professional database expert technical support and service - -## SberCloud {#sbercloud} - -[SberCloud.Advanced](https://sbercloud.ru/en/advanced) provides [MapReduce Service (MRS)](https://docs.sbercloud.ru/mrs/ug/topics/ug__clickhouse.html), a reliable, secure, and easy-to-use enterprise-level platform for storing, processing, and analyzing big data. MRS allows you to quickly create and manage ClickHouse clusters. - -- A ClickHouse instance consists of three ZooKeeper nodes and multiple ClickHouse nodes. The Dedicated Replica mode is used to ensure high reliability of dual data copies. -- MRS provides smooth and elastic scaling capabilities to quickly meet service growth requirements in scenarios where the cluster storage capacity or CPU computing resources are not enough. When you expand the capacity of ClickHouse nodes in a cluster, MRS provides a one-click data balancing tool and gives you the initiative to balance data. You can determine the data balancing mode and time based on service characteristics to ensure service availability, implementing smooth scaling. -- MRS uses the Elastic Load Balance ensuring high availability deployment architecture to automatically distribute user access traffic to multiple backend nodes, expanding service capabilities to external systems and improving fault tolerance. With the ELB polling mechanism, data is written to local tables and read from distributed tables on different nodes. In this way, data read/write load and high availability of application access are guaranteed. - -## Tencent Cloud {#tencent-cloud} - -[Tencent Managed Service for ClickHouse](https://cloud.tencent.com/product/cdwch) provides the following key features: - -- Easy to deploy and manage on Tencent Cloud -- Highly scalable and available -- Integrated monitor and alert service -- High security with isolated per cluster VPCs -- On-demand pricing with no upfront costs or long-term commitments - -{## [Original article](https://clickhouse.com/docs/en/commercial/cloud/) ##} + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. diff --git a/docs/en/commercial/index.md b/docs/en/commercial/index.md index 90e74d88ea8..1f1911b8c4d 100644 --- a/docs/en/commercial/index.md +++ b/docs/en/commercial/index.md @@ -6,12 +6,8 @@ toc_title: Introduction # ClickHouse Commercial Services {#clickhouse-commercial-services} -This section is a directory of commercial service providers specializing in ClickHouse. They are independent companies not necessarily affiliated with Yandex. - Service categories: - [Cloud](../commercial/cloud.md) - [Support](../commercial/support.md) -!!! note "For service providers" - If you happen to represent one of them, feel free to open a pull request adding your company to the respective section (or even adding a new section if the service does not fit into existing categories). The easiest way to open a pull-request for documentation page is by using a “pencil” edit button in the top-right corner. If your service available in some local market, make sure to mention it in a localized documentation page as well (or at least point it out in a pull-request description). diff --git a/docs/en/commercial/support.md b/docs/en/commercial/support.md index 27f3f0c6a22..8ee976c8d6f 100644 --- a/docs/en/commercial/support.md +++ b/docs/en/commercial/support.md @@ -3,23 +3,7 @@ toc_priority: 3 toc_title: Support --- -# ClickHouse Commercial Support Service Providers {#clickhouse-commercial-support-service-providers} +# ClickHouse Commercial Support Service {#clickhouse-commercial-support-service} !!! info "Info" - If you have launched a ClickHouse commercial support service, feel free to [open a pull-request](https://github.com/ClickHouse/ClickHouse/edit/master/docs/en/commercial/support.md) adding it to the following list. - -## Yandex.Cloud - -ClickHouse worldwide support from the authors of ClickHouse. Supports on-premise and cloud deployments. Ask details on clickhouse-support@yandex-team.com - -## Altinity {#altinity} - -Altinity has offered enterprise ClickHouse support and services since 2017. Altinity customers range from Fortune 100 enterprises to startups. Visit [www.altinity.com](https://www.altinity.com/) for more information. - -## Mafiree {#mafiree} - -[Service description](http://mafiree.com/clickhouse-analytics-services.php) - -## MinervaDB {#minervadb} - -[Service description](https://minervadb.com/index.php/clickhouse-consulting-and-support-by-minervadb/) + Detailed public description for ClickHouse support services is not ready yet, please [contact us](/company/#contact) to learn more. diff --git a/docs/en/faq/use-cases/time-series.md b/docs/en/faq/use-cases/time-series.md index 4fc53c0bea4..bf97ac4b1e2 100644 --- a/docs/en/faq/use-cases/time-series.md +++ b/docs/en/faq/use-cases/time-series.md @@ -6,7 +6,7 @@ toc_priority: 101 # Can I Use ClickHouse As a Time-Series Database? {#can-i-use-clickhouse-as-a-time-series-database} -ClickHouse is a generic data storage solution for [OLAP](../../faq/general/olap.md) workloads, while there are many specialized time-series database management systems. Nevertheless, ClickHouse’s [focus on query execution speed](../../faq/general/why-clickhouse-is-so-fast.md) allows it to outperform specialized systems in many cases. There are many independent benchmarks on this topic out there ([example](https://medium.com/@AltinityDB/clickhouse-for-time-series-scalability-benchmarks-e181132a895b)), so we’re not going to conduct one here. Instead, let’s focus on ClickHouse features that are important to use if that’s your use case. +ClickHouse is a generic data storage solution for [OLAP](../../faq/general/olap.md) workloads, while there are many specialized time-series database management systems. Nevertheless, ClickHouse’s [focus on query execution speed](../../faq/general/why-clickhouse-is-so-fast.md) allows it to outperform specialized systems in many cases. There are many independent benchmarks on this topic out there, so we’re not going to conduct one here. Instead, let’s focus on ClickHouse features that are important to use if that’s your use case. First of all, there are **[specialized codecs](../../sql-reference/statements/create/table.md#create-query-specialized-codecs)** which make typical time-series. Either common algorithms like `DoubleDelta` and `Gorilla` or specific to ClickHouse like `T64`. diff --git a/docs/en/sql-reference/data-types/lowcardinality.md b/docs/en/sql-reference/data-types/lowcardinality.md index b3ff26a943d..e4b496aaaab 100644 --- a/docs/en/sql-reference/data-types/lowcardinality.md +++ b/docs/en/sql-reference/data-types/lowcardinality.md @@ -55,6 +55,5 @@ Functions: ## See Also {#see-also} -- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality). - [Reducing ClickHouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/). - [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf). diff --git a/docs/en/sql-reference/statements/create/table.md b/docs/en/sql-reference/statements/create/table.md index d09ff24efcd..5334532a54f 100644 --- a/docs/en/sql-reference/statements/create/table.md +++ b/docs/en/sql-reference/statements/create/table.md @@ -207,8 +207,6 @@ ALTER TABLE codec_example MODIFY COLUMN float_value CODEC(Default); Codecs can be combined in a pipeline, for example, `CODEC(Delta, Default)`. -To select the best codec combination for you project, pass benchmarks similar to described in the Altinity [New Encodings to Improve ClickHouse Efficiency](https://www.altinity.com/blog/2019/7/new-encodings-to-improve-clickhouse) article. One thing to note is that codec can't be applied for ALIAS column type. - !!! warning "Warning" You can’t decompress ClickHouse database files with external utilities like `lz4`. Instead, use the special [clickhouse-compressor](https://github.com/ClickHouse/ClickHouse/tree/master/programs/compressor) utility. diff --git a/docs/ja/commercial/cloud.md b/docs/ja/commercial/cloud.md index 2193a54da6f..dceffcd591f 100644 --- a/docs/ja/commercial/cloud.md +++ b/docs/ja/commercial/cloud.md @@ -5,31 +5,7 @@ toc_priority: 1 toc_title: "\u30AF\u30E9\u30A6\u30C9" --- -# ClickHouseの雲のサービス提供者 {#clickhouse-cloud-service-providers} +# ClickHouse Cloud Service {#clickhouse-cloud-service} -!!! info "情報" - Managed ClickHouse serviceを使用してパブリッククラウドを起動した場合は、以下をお気軽にご利用ください [プル要求を開く](https://github.com/ClickHouse/ClickHouse/edit/master/docs/en/commercial/cloud.md) それを次のリストに追加します。 - -## Yandexクラウド {#yandex-cloud} - -[ClickHouseのためのYandexの管理サービス](https://cloud.yandex.com/services/managed-clickhouse?utm_source=referrals&utm_medium=clickhouseofficialsite&utm_campaign=link3) 次の主な機能を提供します: - -- 完全に管理された飼育係サービス [クリックハウス複製](../engines/table-engines/mergetree-family/replication.md) -- 複数の記憶域タイプの選択肢 -- 異なる可用性ゾーンのレプリカ -- 暗号化と分離 -- 自動メンテナンス - -## Alibaba Cloud {#alibaba-cloud} - -[ClickHouseのためのAlibaba Cloudの管理サービス](https://www.alibabacloud.com/product/clickhouse) 次の主な機能を提供します: - -- Alibaba Cloud Apsara分散システムをベースにした信頼性の高いクラウドディスクストレージエンジン -- 手動でのデータ移行を必要とせずに、オン・デマンドで容量を拡張 -- シングル・ノード、シングル・レプリカ、マルチ・ノード、マルチ・レプリカ・アーキテクチャをサポートし、ホット・データとコールド・データの階層化をサポート -- アクセスホワイトリスト、OneKey Recovery、マルチレイヤーネットワークセキュリティ保護、クラウドディスク暗号化をサポート -- クラウドログシステム、データベース、およびデータアプリケーションツールとのシームレスな統合 -- 組み込み型の監視およびデータベース管理プラットフォーム -- プロフェッショナルデータベースエキスパートによるテクニカル・サポートとサービス - -{## [元の記事](https://clickhouse.com/docs/en/commercial/cloud/) ##} +!!! info "Info" + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. diff --git a/docs/ja/commercial/index.md b/docs/ja/commercial/index.md index 75e13112d9e..75bbe782750 100644 --- a/docs/ja/commercial/index.md +++ b/docs/ja/commercial/index.md @@ -6,4 +6,9 @@ toc_priority: 70 toc_title: "\u5546\u696D" --- +# ClickHouse Commercial Services {#clickhouse-commercial-services} +Service categories: + +- [Cloud](../commercial/cloud.md) +- [Support](../commercial/support.md) diff --git a/docs/ja/sql-reference/statements/create.md b/docs/ja/sql-reference/statements/create.md index a2e1b784060..0117b082faf 100644 --- a/docs/ja/sql-reference/statements/create.md +++ b/docs/ja/sql-reference/statements/create.md @@ -147,8 +147,6 @@ ENGINE = ... ``` -コーデックが指定されている場合、既定のコーデックは適用されません。 コーデックはパイプラインで結合できます。, `CODEC(Delta, ZSTD)`. の選定と大型ブリッジダイオードコーデックの組み合わせますプロジェクト、ベンチマークと同様に記載のAltinity [ClickHouseの効率を改善する新しい符号化](https://www.altinity.com/blog/2019/7/new-encodings-to-improve-clickhouse) 記事だ - !!! warning "警告" できない解凍ClickHouseデータベースファイルを外部の事のように `lz4`. 代わりに、特別な [clickhouse-コンプレッサー](https://github.com/ClickHouse/ClickHouse/tree/master/programs/compressor) ユーティリティ diff --git a/docs/ru/commercial/cloud.md b/docs/ru/commercial/cloud.md index e00fc3be673..2bdb8d68da5 100644 --- a/docs/ru/commercial/cloud.md +++ b/docs/ru/commercial/cloud.md @@ -5,54 +5,5 @@ toc_title: "Поставщики облачных услуг ClickHouse" # Поставщики облачных услуг ClickHouse {#clickhouse-cloud-service-providers} -!!! info "Инфо" - Если вы запустили публичный облачный сервис с управляемым ClickHouse, не стесняйтесь [открыть pull request](https://github.com/ClickHouse/ClickHouse/edit/master/docs/en/commercial/cloud.md) c добавлением его в последующий список. - -## Yandex Cloud {#yandex-cloud} - -[Yandex Managed Service for ClickHouse](https://cloud.yandex.ru/services/managed-clickhouse?utm_source=referrals&utm_medium=clickhouseofficialsite&utm_campaign=link3) предоставляет следующие ключевые возможности: - -- полностью управляемый сервис ZooKeeper для [репликации ClickHouse](../engines/table-engines/mergetree-family/replication.md) -- выбор типа хранилища -- реплики в разных зонах доступности -- шифрование и изоляция -- автоматизированное техническое обслуживание - -## Altinity.Cloud {#altinity.cloud} - -[Altinity.Cloud](https://altinity.com/cloud-database/) — это полностью управляемый ClickHouse-as-a-Service для публичного облака Amazon. - -- быстрое развертывание кластеров ClickHouse на ресурсах Amazon. -- легкое горизонтальное масштабирование также, как и вертикальное масштабирование узлов. -- изолированные виртуальные сети для каждого клиента с общедоступным эндпоинтом или пирингом VPC. -- настраиваемые типы и объемы хранилищ -- cross-az масштабирование для повышения производительности и обеспечения высокой доступности -- встроенный мониторинг и редактор SQL-запросов - -## Alibaba Cloud {#alibaba-cloud} - -Управляемый облачный сервис Alibaba для ClickHouse: [китайская площадка](https://www.aliyun.com/product/clickhouse), будет доступен на международной площадке в мае 2021 года. Сервис предоставляет следующие возможности: - -- надежный сервер для облачного хранилища на основе распределенной системы [Alibaba Cloud Apsara](https://www.alibabacloud.com/product/apsara-stack); -- расширяемая по запросу емкость, без переноса данных вручную; -- поддержка одноузловой и многоузловой архитектуры, архитектуры с одной или несколькими репликами, а также многоуровневого хранения cold и hot data; -- поддержка прав доступа, one-key восстановления, многоуровневая защита сети, шифрование облачного диска; -- полная интеграция с облачными системами логирования, базами данных и инструментами обработки данных; -- встроенная платформа для мониторинга и управления базами данных; -- техническая поддержка от экспертов по работе с базами данных. - -## SberCloud {#sbercloud} - -[Облачная платформа SberCloud.Advanced](https://sbercloud.ru/ru/advanced): - -- предоставляет более 50 высокотехнологичных сервисов; -- позволяет быстро создавать и эффективно управлять ИТ-инфраструктурой, приложениями и интернет-сервисами; -- радикально минимизирует ресурсы, требуемые для работы корпоративных ИТ-систем; -- в разы сокращает время вывода новых продуктов на рынок. - -SberCloud.Advanced предоставляет [MapReduce Service (MRS)](https://docs.sbercloud.ru/mrs/ug/topics/ug__clickhouse.html) — надежную, безопасную и простую в использовании платформу корпоративного уровня для хранения, обработки и анализа больших данных. MRS позволяет быстро создавать и управлять кластерами ClickHouse. - -- Инстанс ClickHouse состоит из трех узлов ZooKeeper и нескольких узлов ClickHouse. Выделенный режим реплики используется для обеспечения высокой надежности двойных копий данных. -- MRS предлагает возможности гибкого масштабирования при быстром росте сервисов в сценариях, когда емкости кластерного хранилища или вычислительных ресурсов процессора недостаточно. MRS в один клик предоставляет инструмент для балансировки данных при расширении узлов ClickHouse в кластере. Вы можете определить режим и время балансировки данных на основе характеристик сервиса, чтобы обеспечить доступность сервиса. -- MRS использует архитектуру развертывания высокой доступности на основе Elastic Load Balance (ELB) — сервиса для автоматического распределения трафика на несколько внутренних узлов. Благодаря ELB, данные записываются в локальные таблицы и считываются из распределенных таблиц на разных узлах. Такая архитектура повышает отказоустойчивость кластера и гарантирует высокую доступность приложений. - +!!! info "Info" + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. diff --git a/docs/ru/commercial/index.md b/docs/ru/commercial/index.md index 66b1b125823..00fb1be7625 100644 --- a/docs/ru/commercial/index.md +++ b/docs/ru/commercial/index.md @@ -10,8 +10,5 @@ toc_title: "Коммерческие услуги" Категории услуг: -- Облачные услуги [Cloud](../commercial/cloud.md) -- Поддержка [Support](../commercial/support.md) - -!!! note "Для поставщиков услуг" - Если вы — представитель компании-поставщика услуг, вы можете отправить запрос на добавление вашей компании и ваших услуг в соответствующий раздел данной документации (или на добавление нового раздела, если ваши услуги не соответствуют ни одной из существующих категорий). Чтобы отправить запрос (pull-request) на добавление описания в документацию, нажмите на значок "карандаша" в правом верхнем углу страницы. Если ваши услуги доступны в только отдельных регионах, не забудьте указать это на соответствующих локализованных страницах (и обязательно отметьте это при отправке заявки). +- [Облачные услуги](../commercial/cloud.md) +- [Поддержка](../commercial/support.md) diff --git a/docs/ru/faq/use-cases/time-series.md b/docs/ru/faq/use-cases/time-series.md index ea56660cc10..5b46de78274 100644 --- a/docs/ru/faq/use-cases/time-series.md +++ b/docs/ru/faq/use-cases/time-series.md @@ -6,7 +6,7 @@ toc_priority: 101 # Можно ли использовать ClickHouse как базу данных временных рядов? {#can-i-use-clickhouse-as-a-time-series-database} -ClickHouse — это универсальное решение для [OLAP](../../faq/general/olap.md) операций, в то время как существует много специализированных СУБД временных рядов. Однако [высокая скорость выполнения запросов](../../faq/general/why-clickhouse-is-so-fast.md) позволяет CLickHouse во многих случаях "побеждать" специализированные аналоги. В подтверждение этому есть много [примеров](https://medium.com/@AltinityDB/clickhouse-for-time-series-scalability-benchmarks-e181132a895b) с конкретными показателями производительности, так что мы не будем останавливаться на этом подробно. Лучше рассмотрим те возможности ClickHouse, которые стоит использовать. +ClickHouse — это универсальное решение для [OLAP](../../faq/general/olap.md) операций, в то время как существует много специализированных СУБД временных рядов. Однако [высокая скорость выполнения запросов](../../faq/general/why-clickhouse-is-so-fast.md) позволяет CLickHouse во многих случаях "побеждать" специализированные аналоги. В подтверждение этому есть много примеров с конкретными показателями производительности, так что мы не будем останавливаться на этом подробно. Лучше рассмотрим те возможности ClickHouse, которые стоит использовать. Во-первых, есть **[специальные кодеки](../../sql-reference/statements/create/table.md#create-query-specialized-codecs)**, которые составляют типичные временные ряды. Это могут быть либо стандартные алгоритмы, такие как `DoubleDelta` или `Gorilla`, либо специфические для ClickHouse, например `T64`. diff --git a/docs/ru/sql-reference/data-types/lowcardinality.md b/docs/ru/sql-reference/data-types/lowcardinality.md index 49ba5db0169..dee201bce33 100644 --- a/docs/ru/sql-reference/data-types/lowcardinality.md +++ b/docs/ru/sql-reference/data-types/lowcardinality.md @@ -55,6 +55,5 @@ ORDER BY id ## Смотрите также -- [A Magical Mystery Tour of the LowCardinality Data Type](https://www.altinity.com/blog/2019/3/27/low-cardinality). - [Reducing Clickhouse Storage Cost with the Low Cardinality Type – Lessons from an Instana Engineer](https://www.instana.com/blog/reducing-clickhouse-storage-cost-with-the-low-cardinality-type-lessons-from-an-instana-engineer/). - [String Optimization (video presentation in Russian)](https://youtu.be/rqf-ILRgBdY?list=PL0Z2YDlm0b3iwXCpEFiOOYmwXzVmjJfEt). [Slides in English](https://github.com/yandex/clickhouse-presentations/raw/master/meetup19/string_optimization.pdf). diff --git a/docs/ru/sql-reference/statements/create/table.md b/docs/ru/sql-reference/statements/create/table.md index 77c192b2b26..6601276d573 100644 --- a/docs/ru/sql-reference/statements/create/table.md +++ b/docs/ru/sql-reference/statements/create/table.md @@ -201,8 +201,6 @@ ALTER TABLE codec_example MODIFY COLUMN float_value CODEC(Default); Кодеки можно последовательно комбинировать, например, `CODEC(Delta, Default)`. -Чтобы выбрать наиболее подходящую для вашего проекта комбинацию кодеков, необходимо провести сравнительные тесты, подобные тем, что описаны в статье Altinity [New Encodings to Improve ClickHouse Efficiency](https://www.altinity.com/blog/2019/7/new-encodings-to-improve-clickhouse). Для столбцов типа `ALIAS` кодеки не применяются. - !!! warning "Предупреждение" Нельзя распаковать базу данных ClickHouse с помощью сторонних утилит наподобие `lz4`. Необходимо использовать специальную утилиту [clickhouse-compressor](https://github.com/ClickHouse/ClickHouse/tree/master/programs/compressor). diff --git a/docs/zh/commercial/cloud.md b/docs/zh/commercial/cloud.md index 2082e37de8e..e8c098db5be 100644 --- a/docs/zh/commercial/cloud.md +++ b/docs/zh/commercial/cloud.md @@ -5,58 +5,7 @@ toc_title: 云 # ClickHouse 云服务提供商 {#clickhouse-cloud-service-providers} -!!! info "注意" - 如果您已经推出具有托管ClickHouse服务的公共云,请随时[提交一个 pull request](https://github.com/ClickHouse/ClickHouse/edit/master/docs/en/commercial/cloud.md)将其添加到以下列表。 +# ClickHouse Cloud Service {#clickhouse-cloud-service} -## Yandex 云 {#yandex-cloud} - -[Yandex ClickHouse托管服务](https://cloud.yandex.com/services/managed-clickhouse?utm_source=referrals&utm_medium=clickhouseofficialsite&utm_campaign=link3)提供以下主要功能: - -- 用于[ClickHouse replication](../engines/table-engines/mergetree-family/replication.md)的完全托管的ZooKeeper服务 -- 多种存储类型选择 -- 不同可用区副本 -- 加密与隔离 -- 自动化维护 - -## Altinity.Cloud {#altinity.cloud} - -[Altinity.Cloud](https://altinity.com/cloud-database/)是针对Amazon公共云的完全托管的ClickHouse-as-a-Service - -- 在Amazon资源上快速部署ClickHouse集群 -- 轻松进行横向扩展/纵向扩展以及节点的垂直扩展 -- 具有公共端点或VPC对等的租户隔离 -- 可配置存储类型以及卷配置 -- 跨可用区扩展以实现性能和高可用性 -- 内置监控和SQL查询编辑器 - -## 阿里云{#alibaba-cloud} - -[阿里云ClickHouse托管服务](https://www.alibabacloud.com/zh/product/clickhouse)提供以下主要功能: - -- 基于阿里飞天分布式系统的高可靠云盘存储引擎 -- 按需扩容,无需手动进行数据搬迁 -- 支持单节点、单副本、多节点、多副本多种架构,支持冷热数据分层 -- 支持访问白名单和一键恢复,多层网络安全防护,云盘加密 -- 与云上日志系统、数据库、数据应用工具无缝集成 -- 内置监控和数据库管理平台 -- 专业的数据库专家技术支持和服务 - -## SberCloud {#sbercloud} - -[SberCloud.Advanced](https://sbercloud.ru/en/advanced)提供[MapReduce Service (MRS)](https://docs.sbercloud.ru/mrs/ug/topics/ug__clickhouse.html), 一个可靠、安全且易于使用的企业级平台,用于存储、处理和分析大数据。MRS允许您快速创建和管理ClickHouse集群。 - -- 一个ClickHouse实例由三个ZooKeeper节点和多个ClickHouse节点组成。 Dedicated Replica模式用于保证双数据副本的高可靠性。 -- MRS提供平滑弹性伸缩能力,快速满足集群存储容量或CPU计算资源不足场景下的业务增长需求。当您扩展集群中ClickHouse节点的容量时,MRS提供一键式数据平衡工具,让您主动进行数据平衡。 您可以根据业务特点确定数据均衡方式和时间,保证业务的可用性,实现平滑扩展。 -- MRS采用弹性负载均衡保障高可用部署架构,自动将用户访问流量分配到多个后端节点,将服务能力扩展到外部系统,提高容错能力。 通过ELB轮询机制,数据写入本地表,从不同节点的分布式表中读取。 这样就保证了数据读写负载和应用访问的高可用。 - -## 腾讯云 {#tencent-cloud} - -[腾讯云ClickHouse托管服务](https://cloud.tencent.com/product/cdwch)提供以下主要功能: - -- 易于在腾讯云上部署和管理 -- 高度可扩展和可用 -- 集成监控和警报服务 -- 每个集群VPC隔离的高安全性 -- 按需定价,无前期成本或长期承诺 - -{## [原始文章](https://clickhouse.com/docs/en/commercial/cloud/) ##} +!!! info "Info" + Detailed public description for ClickHouse cloud services is not ready yet, please [contact us](/company/#contact) to learn more. diff --git a/docs/zh/commercial/index.md b/docs/zh/commercial/index.md index 34641b8f7fc..5f29e5f21c9 100644 --- a/docs/zh/commercial/index.md +++ b/docs/zh/commercial/index.md @@ -6,12 +6,7 @@ toc_title: 简介 # ClickHouse 商业服务 {#clickhouse-commercial-services} -此部分是专门从事ClickHouse的商业服务提供商的目录。 他们是独立的公司,不一定隶属于Yandex。 - 服务类别: - [云](../commercial/cloud.md) - [支持](../commercial/support.md) - -!!! note "对于服务提供商" - 如果您碰巧代表其中之一,请随时提交一个pull request,将您的公司添加到相应部分(如果服务不适合现有类别,甚至可以添加新部分)。 提交关于文档的pull request最简单的方式是点击右上角的“铅笔”编辑按钮。 如果您的服务在某些本地市场可用,请确保也在本地化文档页面中提及它(或至少在pull request请求描述中指出)。 diff --git a/docs/zh/commercial/support.md b/docs/zh/commercial/support.md index 3139709e4b8..5e5d00a22b8 100644 --- a/docs/zh/commercial/support.md +++ b/docs/zh/commercial/support.md @@ -5,21 +5,5 @@ toc_title: 支持 # ClickHouse 商业支持服务提供商 {#clickhouse-commercial-support-service-providers} -!!! info "注意" - 如果您已经推出ClickHouse商业支持服务,请随时[提交一个pull request](https://github.com/ClickHouse/ClickHouse/edit/master/docs/en/commercial/support.md)将其添加到以下列表。 - -## Yandex.Cloud - -来自ClickHouse作者的ClickHouse全球支持。 支持内部部署和云部署。 在clickhouse-support@yandex-team.com上询问详细信息 - -## Altinity {#altinity} - -Altinity自2017年以来一直为企业ClickHouse提供支持和服务。 Altinity的客户范围从财富100强企业到初创公司。访问 [www.altinity.com](https://www.altinity.com/)了解更多信息。 - -## Mafiree {#mafiree} - -[Service description](http://mafiree.com/clickhouse-analytics-services.php) - -## MinervaDB {#minervadb} - -[Service description](https://minervadb.com/index.php/clickhouse-consulting-and-support-by-minervadb/) +!!! info "Info" + Detailed public description for ClickHouse support services is not ready yet, please [contact us](/company/#contact) to learn more. diff --git a/website/blog/en/2020/package-repository-behind-cdn.md b/website/blog/en/2020/package-repository-behind-cdn.md index c5857bcd4a4..fb724d18cb3 100644 --- a/website/blog/en/2020/package-repository-behind-cdn.md +++ b/website/blog/en/2020/package-repository-behind-cdn.md @@ -52,7 +52,7 @@ Implementing it required a little bit of research, but the overall solution appe 1. For a ClickHouse database, it was a no-brainer to use [Yandex Managed Service for ClickHouse](https://cloud.yandex.com/services/managed-clickhouse). With a few clicks in the admin interface, we got a running ClickHouse cluster with properly configured high-availability and automated backups. Ad-hoc SQL queries could be run from that same admin interface. 2. Cloudflare allows customers to run custom code on CDN edge servers in a serverless fashion (so-called [workers](https://workers.cloudflare.com)). Those workers are executed in a tight sandbox which doesn't allow for anything complicated, but this feature fits perfectly to gather some data about download events and send it somewhere else. This is normally a paid feature, but special thanks to Connor Peshek from Cloudflare who arranged a lot of extra features for free on `clickhouse.tech` when we have applied to their [open-source support program](https://developers.cloudflare.com/sponsorships/). -3. To avoid publicly exposing yet another ClickHouse instance (like we did with **[playground](https://clickhouse.tech/docs/en/getting-started/playground/)** regardless of being a 100% anti-pattern), the download event data is sent to [Yandex Cloud Functions](https://cloud.yandex.com/services/functions). It's a generic serverless computing framework at Yandex Cloud, which also allows running custom code without maintaining any servers, but with less strict sandbox limitations and direct access to other cloud services like Managed ClickHouse that was needed for this task. +3. To avoid publicly exposing yet another ClickHouse instance (like we did with **[playground](/docs/en/getting-started/playground/)** regardless of being a 100% anti-pattern), the download event data is sent to [Yandex Cloud Functions](https://cloud.yandex.com/services/functions). It's a generic serverless computing framework at Yandex Cloud, which also allows running custom code without maintaining any servers, but with less strict sandbox limitations and direct access to other cloud services like Managed ClickHouse that was needed for this task. 4. It didn't require much effort to choose a visualization tool either, as [DataLens BI](https://cloud.yandex.com/docs/datalens/) is tightly integrated with ClickHouse, capable to build what's required right from the UI, and satisfies the “no servers” requirement because it's a SaaS solution. Public access option for charts and dashboards have also appeared to be handy. There's not so much data collected yet, but here's a live example of how the resulting data visualization looks like. For example, here we can see that LTS releases of ClickHouse are not so popular yet *(yes, we have [LTS releases](https://clickhouse.tech/docs/en/faq/operations/production/)!)*: diff --git a/website/blog/en/2020/pixel-benchmark.md b/website/blog/en/2020/pixel-benchmark.md index b9be7638c38..57e9aec3857 100644 --- a/website/blog/en/2020/pixel-benchmark.md +++ b/website/blog/en/2020/pixel-benchmark.md @@ -66,7 +66,7 @@ What is this function, and why do we need it? In this particular stack trace, we ## Is your phone good enough? -There is a beaten genre of using data sets and queries of a varying degree of syntheticity to prove that a particular DBMS you work on has performance superior to other, less advanced, DBMSes. We've moved past that, and instead use the DBMS we love as a benchmark of hardware. For this benchmark we use a small 100M rows obfuscated data set from Yandex.Metrica, about 12 GB compressed, and some queries representative of Metrica dashboards. There is [this page](https://clickhouse.tech/benchmark/hardware/) with crowdsourced results for various cloud and traditional servers and even some laptops, but how do the phones compare? Let's find out. Following [the manual](https://clickhouse.tech/docs/en/operations/performance-test/) to download the necessary data to the phone and run the benchmark was pretty straightforward. One problem was that some queries can't run because they use too much memory and the server gets killed by Android, so I had to script around that. Also, I'm not sure how to reset a file system cache on Android, so the 'cold run' data is not correct. The results look pretty good: +There is a beaten genre of using data sets and queries of a varying degree of syntheticity to prove that a particular DBMS you work on has performance superior to other, less advanced, DBMSes. We've moved past that, and instead use the DBMS we love as a benchmark of hardware. For this benchmark we use a small 100M rows obfuscated data set from Yandex.Metrica, about 12 GB compressed, and some queries representative of Metrica dashboards. There is [this page](/benchmark/hardware/) with crowdsourced results for various cloud and traditional servers and even some laptops, but how do the phones compare? Let's find out. Following [the manual](/docs/en/operations/performance-test/) to download the necessary data to the phone and run the benchmark was pretty straightforward. One problem was that some queries can't run because they use too much memory and the server gets killed by Android, so I had to script around that. Also, I'm not sure how to reset a file system cache on Android, so the 'cold run' data is not correct. The results look pretty good: diff --git a/website/blog/en/2020/the-clickhouse-community.md b/website/blog/en/2020/the-clickhouse-community.md index 7080fed6479..9a76a5a526c 100644 --- a/website/blog/en/2020/the-clickhouse-community.md +++ b/website/blog/en/2020/the-clickhouse-community.md @@ -16,7 +16,7 @@ ClickHouse began as a solution for web analytics in [Yandex Metrica](https://met This is a classic problem for data warehouses. However, Alexey could not find one that met Yandex requirements, specifically large datasets, linear scaling, high efficiency, and compatibility with SQL tools. In a nutshell: like MySQL but for analytic applications. So Alexey wrote one. It started as a prototype to do GROUP BY operations. -The prototype evolved into a full solution with a name, ClickHouse, short for “Clickstream Data Warehouse”. Alexey added additional features including SQL support and the MergeTree engine. The SQL dialect was superficially similar to MySQL, [which was also used in Metrica](https://clickhouse.tech/blog/en/2016/evolution-of-data-structures-in-yandex-metrica/) but could not handle query workloads without complex pre-aggregation. By 2011 ClickHouse was in production for Metrica. +The prototype evolved into a full solution with a name, ClickHouse, short for “Clickstream Data Warehouse”. Alexey added additional features including SQL support and the MergeTree engine. The SQL dialect was superficially similar to MySQL, [which was also used in Metrica](/blog/en/2016/evolution-of-data-structures-in-yandex-metrica/) but could not handle query workloads without complex pre-aggregation. By 2011 ClickHouse was in production for Metrica. Over the next 5 years Alexey and a growing team of developers extended ClickHouse to cover new use cases. By 2016 ClickHouse was a core Metrica backend service. It was also becoming entrenched as a data warehouse within Yandex, extending to use cases like service monitoring, network flow logs, and event management. ClickHouse had evolved from the original one-person project to business critical software with a full team of a dozen engineers led by Alexey. @@ -38,13 +38,13 @@ Alexey and the development team moved ClickHouse code to a Github repo under the ClickHouse quickly picked up steam in Eastern Europe. The first ClickHouse meetups started in 2016 and have grown to include 200 participants for in-person meetings and up to 400 for online meetings. ClickHouse is now widely used in start-ups in Russia as well as other Eastern European countries. Developers located in Eastern Europe continue to supply more contributions to ClickHouse than any other region. -ClickHouse also started to gain recognition in the US and Western Europe. [CloudFlare](https://www.cloudflare.com/) published a widely read blog article about [their success using ClickHouse for DNS analytics](https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/). Alexander Zaitsev successfully migrated an ad tech analytics system from a commercial DBMS to a ClickHouse cluster. This success prompted him to found [Altinity](https://altinity.com) in 2017 with help from friends at [Percona](https://www.percona.com). US meetups started in the same year. With support from Altinity these have grown to over 100 attendees for online meetings. +ClickHouse also started to gain recognition in the US and Western Europe. [CloudFlare](https://www.cloudflare.com/) published a widely read blog article about [their success using ClickHouse for DNS analytics](https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/). ClickHouse also took off in China. The first meetup in China took place in 2018 and attracted enormous interest. In-person meetups included over 400 participants. Online meetings have reached up to 1000 online viewers. In 2019 a further step occurred as ClickHouse moved out from under the Yandex Github organization into a separate [ClickHouse organization](https://github.com/ClickHouse). The new organization includes ClickHouse server code plus core ecosystem projects like the cpp and ODBC drivers. -ClickHouse community events shifted online following world-wide disruptions due to COVID-19, but growth in usage continued. One interesting development has been the increasing number of startups using ClickHouse as a backend. Many of these are listed on the [ClickHouse Adopters](https://clickhouse.tech/docs/en/introduction/adopters/) page. Also, additional prominent companies like eBay, Uber, and Flipcart went public in 2020 with stories of successful ClickHouse usage. +ClickHouse community events shifted online following world-wide disruptions due to COVID-19, but growth in usage continued. One interesting development has been the increasing number of startups using ClickHouse as a backend. Many of these are listed on the [ClickHouse Adopters](/docs/en/introduction/adopters/) page. Also, additional prominent companies like eBay, Uber, and Flipcart went public in 2020 with stories of successful ClickHouse usage. ## The ClickHouse community today @@ -87,17 +87,16 @@ ClickHouse ecosystem projects are also growing rapidly. Here is a selected list * [mymarilyn/clickhouse-driver](https://github.com/mymarilyn/clickhouse-driver) — ClickHouse Python driver with native interface support * [Vertamedia/clickhouse-grafana](https://github.com/Vertamedia/clickhouse-grafana) — Grafana datasource for ClickHouse * [smi2/phpClickHouse](https://github.com/smi2/phpClickHouse) — PHP ClickHouse client -* [Altinity/clickhouse-operator](https://github.com/Altinity/clickhouse-operator) — Kubernetes operator for ClickHouse * [AlexAkulov/clickhouse-backup](https://github.com/AlexAkulov/clickhouse-backup) — ClickHouse backup and restore using cloud storage * [And almost 1200 more...](https://github.com/search?o=desc&p=1&q=clickhouse&s=stars&type=Repositories) ## Resources -With the community growth numerous resources are available to users. At the center is the [ClickHouse org on Github](https://github.com/ClickHouse), which hosts [ClickHouse server code](https://github.com/ClickHouse/ClickHouse). ClickHouse server documentation is available at the [clickhouse.tech](https://clickhouse.tech/) website. It has [installation instructions](https://clickhouse.tech/docs/en/getting-started/install/) and links to ClickHouse community builds for major Linux distributions as well as Mac, FreeBSD, and Docker. +With the community growth numerous resources are available to users. At the center is the [ClickHouse org on Github](https://github.com/ClickHouse), which hosts [ClickHouse server code](https://github.com/ClickHouse/ClickHouse). ClickHouse server documentation is available at the [clickhouse.tech](/) website. It has [installation instructions](/docs/en/getting-started/install/) and links to ClickHouse community builds for major Linux distributions as well as Mac, FreeBSD, and Docker. In addition, ClickHouse users have a wide range of ways to engage with the community and get help on applications. These include both chat applications as well as meetups. Here are some links to get started. -* Yandex Meetups — Yandex has regular in-person and online international and Russian-language meetups. Video recordings and online translations are available at the official [YouTube channel](https://www.youtube.com/c/ClickHouseDB/videos). Watch for announcements on the [clickhouse.tech](https://clickhouse.tech/) site and [Telegram](https://t.me/clickhouse_ru). +* Yandex Meetups — Yandex has regular in-person and online international and Russian-language meetups. Video recordings and online translations are available at the official [YouTube channel](https://www.youtube.com/c/ClickHouseDB/videos). Watch for announcements on the [clickhouse.tech](/) site and [Telegram](https://t.me/clickhouse_ru). * [SF Bay Area ClickHouse Meetup](https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/) — The largest US ClickHouse meetup, with meetings approximately every 2 months. * Chinese meetups occur at regular intervals with different sponsors. Watch for announcements on clickhouse.tech. * Telegram - By far the largest forum for ClickHouse. It is the best place to talk to ClickHouse devs. There are two groups. @@ -118,26 +117,18 @@ We welcome users to join the ClickHouse community in every capacity. There are f Start with the documentation. Download ClickHouse and try it out. Join the chat channels. If you encounter bugs, [log issues](https://github.com/ClickHouse/ClickHouse/issues) so we can get them fixed. Also, it’s easy to make contributions to the documentation if you have basic Github and markdown skills. Press the pencil icon on any page of the clickhouse.tech website to edit pages and automatically generate pull requests to merge your changes. -If your company has deployed ClickHouse and is comfortable talking about it, please don't be shy. Add them to the [ClickHouse Adopters](https://clickhouse.tech/docs/en/introduction/adopters/) page so that others can learn from your experience. +If your company has deployed ClickHouse and is comfortable talking about it, please don't be shy. Add them to the [ClickHouse Adopters](/docs/en/introduction/adopters/) page so that others can learn from your experience. ### Become a ClickHouse developer Write code to make ClickHouse better. Here are your choices. -* ClickHouse server — Start with the [“For Beginners” documentation](https://clickhouse.tech/docs/en/development/developer-instruction/) to learn how to build ClickHouse and submit PRs. Check out the current ClickHouse issues if you are looking for work. PRs that follow the development standards will be merged faster. +* ClickHouse server — Start with the [“For Beginners” documentation](/docs/en/development/developer-instruction/) to learn how to build ClickHouse and submit PRs. Check out the current ClickHouse issues if you are looking for work. PRs that follow the development standards will be merged faster. * Ecosystem projects — Most projects in the ClickHouse ecosystem accept PRs. Check with each project for specific practices. ClickHouse is also a great target for research problems. Overall the years many dozens of university CS students have worked on ClickHouse features. Alexey Milovidov maintains an especially rich set of [project suggestions for students](https://github.com/ClickHouse/ClickHouse/issues/15065). Join Telegram and ask for help if you are interested. Both Yandex and Altinity also offer internships. -### Write ClickHouse applications - -ClickHouse enables a host of new applications that depend on low latency access to large datasets. If you write something interesting, blog about it and present at local meetups. Altinity has a program to highlight startups who are developing ClickHouse applications and help with marketing as well as resources for development. Send email to [info@altinity.com](mailto:info@altinity.com) for more information. - -### Become a corporate sponsor - -The ClickHouse community has been assisted by many corporate users who have helped organize meetups, funded development, and guided growth of ClickHouse. Contact community members directly at [clickhouse-feedback@yandex-team.ru](mailto:clickhouse-feedback@yandex-team.ru), [info@altinity.com](mailto:info@altinity.com), or via Telegram to find out more about how to chip in as a corporate sponsor. - ## Where we go from here ClickHouse has grown enormously from its origins as a basic prototype in 2008 to the popular SQL data warehouse users see today. Our community is the rock that will enable ClickHouse to become the default data warehouse worldwide. We are working together to create an inclusive environment where everyone feels welcome and has an opportunity to contribute. We welcome you to join! diff --git a/website/blog/en/2021/fuzzing-clickhouse.md b/website/blog/en/2021/fuzzing-clickhouse.md index b6852dcce15..68a394bfe0c 100644 --- a/website/blog/en/2021/fuzzing-clickhouse.md +++ b/website/blog/en/2021/fuzzing-clickhouse.md @@ -51,7 +51,7 @@ After fixing the majority of pre-existing error, this fuzzer became efficient fo A major factor that makes fuzzing really efficient for us is that we have a lot of assertions and other checks of program logic in our code. For debug-only checks, we use the plain `assert` macro from ``. For checks that are needed even in release mode, we use an exception with a special code `LOGICAL_ERROR` that signifies an internal program error. We did some work to ensure that these errors are distinct from errors caused by the wrong user actions. A user error reported for a randomly generated query is normal (e.g. it references some non-existent columns), but when we see an internal program error, we know that it's definitely a bug, same as an assertion. Of course, even without assertions, you get some checks for memory errors provided by the OS (segfaults). Another way to add runtime checks to your program is to use some kind of sanitizer. We already run most of our tests under clang's Address, Memory, UndefinedBehavior and Thread sanitizers. Using them in conjunction with this fuzzer also proved to be very efficient. -To see for yourself how the fuzzer works, you only need the normal ClickHouse client. Start `clickhouse-client --query-fuzzer-runs=100`, enter any query, and enjoy the client going crazy and running a hundred of random queries instead. All queries from the current session become a source for expressions for fuzzing, so try entering several different queries to get more interesting results. Be careful not to do this in production! When you do this experiment, you'll soon notice that the fuzzer tends to generate queries that take very long to run. This is why for the CI fuzzer runs we have to configure the server to limit query execution time, memory usage and so on using the corresponding [server settings](https://clickhouse.tech/docs/en/operations/settings/query-complexity/#:~:text=In%20the%20default%20configuration%20file,query%20within%20a%20single%20server.). We had a hilarious situation after that: the fuzzer figured out how to remove the limits by generating a `SET max_execution_time = 0` query, and then generated a never-ending query and failed. Thankfully we were able to defeat its cleverness by using [settings constraints](https://clickhouse.tech/docs/en/operations/settings/constraints-on-settings/). +To see for yourself how the fuzzer works, you only need the normal ClickHouse client. Start `clickhouse-client --query-fuzzer-runs=100`, enter any query, and enjoy the client going crazy and running a hundred of random queries instead. All queries from the current session become a source for expressions for fuzzing, so try entering several different queries to get more interesting results. Be careful not to do this in production! When you do this experiment, you'll soon notice that the fuzzer tends to generate queries that take very long to run. This is why for the CI fuzzer runs we have to configure the server to limit query execution time, memory usage and so on using the corresponding [server settings](/docs/en/operations/settings/query-complexity/#:~:text=In%20the%20default%20configuration%20file,query%20within%20a%20single%20server.). We had a hilarious situation after that: the fuzzer figured out how to remove the limits by generating a `SET max_execution_time = 0` query, and then generated a never-ending query and failed. Thankfully we were able to defeat its cleverness by using [settings constraints](/docs/en/operations/settings/constraints-on-settings/). ## Other Fuzzers diff --git a/website/blog/en/2021/performance-test-1.md b/website/blog/en/2021/performance-test-1.md index 3d15a7ea3ec..0e6de5fb049 100644 --- a/website/blog/en/2021/performance-test-1.md +++ b/website/blog/en/2021/performance-test-1.md @@ -6,11 +6,11 @@ author: '[Alexander Kuzmenkov](https://github.com/akuzm)' tags: ['testing', 'performance'] --- -One of the main selling points of ClickHouse is that it's very fast, in many cases utilizing the hardware up to the theoretical limits. This was noted by many independent benchmark such as [this one](http://brandonharris.io/redshift-clickhouse-time-series/). This speed boils down to a right combination of architectural choices and algorithmic optimizations, sprinkled with a dash of pixie dust. There is an [overview of these factors](https://clickhouse.tech/docs/en/faq/general/why-clickhouse-is-so-fast) on our website, or a talk by the ClickHouse lead developer Alexey Milovidov ["The secrets of ClickHouse performance optimizations"](https://www.youtube.com/watch?v=ZOZQCQEtrz8). But this is a static picture of "how the things are". Software is a living and changing organism, and ClickHouse is changing very fast — to give you a scale, in July 2021 we merged 319 pull requests made by 60 different authors ([live statistics here](https://gh-api.clickhouse.tech/play?user=play#c2VsZWN0IGRhdGVfdHJ1bmMoJ21vbnRoJywgY3JlYXRlZF9hdCkgbW9udGgsIHVuaXEoY3JlYXRvcl91c2VyX2xvZ2luKSBhdXRob3JzLCB1bmlxKG51bWJlcikgcHJzIGZyb20gZ2l0aHViX2V2ZW50cyB3aGVyZSByZXBvX25hbWUgPSAnQ2xpY2tIb3VzZS9DbGlja0hvdXNlJyBhbmQgbm90IGhhc0FueShsYWJlbHMsIFsncHItYmFja3BvcnQnLCAncHItZG9jdW1lbnRhdGlvbicsICdwci1jaGVycnlwaWNrJ10pIGFuZCBtZXJnZWQgYW5kIGNyZWF0ZWRfYXQgYmV0d2VlbiAnMjAyMC0wOS0wMScgYW5kICcyMDIxLTA5LTAxJyBncm91cCBieSBtb250aA==)). Any quality that is not actively selected for is going to be lost in this endless stream of changes, and the performance is no exception. For this reason, we have to have some process that allows us to ensure than ClickHouse always stays fast. +One of the main selling points of ClickHouse is that it's very fast, in many cases utilizing the hardware up to the theoretical limits. This was noted by many independent benchmark such as [this one](http://brandonharris.io/redshift-clickhouse-time-series/). This speed boils down to a right combination of architectural choices and algorithmic optimizations, sprinkled with a dash of pixie dust. There is an [overview of these factors](/docs/en/faq/general/why-clickhouse-is-so-fast) on our website, or a talk by the ClickHouse lead developer Alexey Milovidov ["The secrets of ClickHouse performance optimizations"](https://www.youtube.com/watch?v=ZOZQCQEtrz8). But this is a static picture of "how the things are". Software is a living and changing organism, and ClickHouse is changing very fast — to give you a scale, in July 2021 we merged 319 pull requests made by 60 different authors ([live statistics here](https://gh-api.clickhouse.tech/play?user=play#c2VsZWN0IGRhdGVfdHJ1bmMoJ21vbnRoJywgY3JlYXRlZF9hdCkgbW9udGgsIHVuaXEoY3JlYXRvcl91c2VyX2xvZ2luKSBhdXRob3JzLCB1bmlxKG51bWJlcikgcHJzIGZyb20gZ2l0aHViX2V2ZW50cyB3aGVyZSByZXBvX25hbWUgPSAnQ2xpY2tIb3VzZS9DbGlja0hvdXNlJyBhbmQgbm90IGhhc0FueShsYWJlbHMsIFsncHItYmFja3BvcnQnLCAncHItZG9jdW1lbnRhdGlvbicsICdwci1jaGVycnlwaWNrJ10pIGFuZCBtZXJnZWQgYW5kIGNyZWF0ZWRfYXQgYmV0d2VlbiAnMjAyMC0wOS0wMScgYW5kICcyMDIxLTA5LTAxJyBncm91cCBieSBtb250aA==)). Any quality that is not actively selected for is going to be lost in this endless stream of changes, and the performance is no exception. For this reason, we have to have some process that allows us to ensure than ClickHouse always stays fast. # Measuring and Comparing the Performance -How do we know it is fast, in the first place? We do a lot of benchmarks, many kinds of them. The most basic kind of a benchmark is a micro-benchmark, that doesn't use the full code of the server and tests a particular algorithm in isolation. We use them to choose a better inner loop for some aggregate function, or to test various layouts of hash tables, and so on. For example, when we discovered that a competing database engine completes a query with `sum` aggregate function twice as fast, we tested a couple of dozen implementations of `sum` to ultimately find the one that gives the best performance (see [a talk](https://www.youtube.com/watch?v=MJJfWoWJq0o) about this, in Russian). But testing a particular algorithm by itself is not enough to say how the entire query is going to work. We have to also make end-to-end measurements of entire queries, often using the real production data, because the particulars of the data (e.g. the cardinality and the distribution of values) heavily influence the performance. Currently we have about 3000 end-to-end test queries organized into about 200 [tests](https://github.com/ClickHouse/ClickHouse/tree/6c4c3df96e41425185beb0c471a8dde0ce6f25a7/tests/performance). Many of them use real data sets, such as the [production data of Yandex.Metrica](https://clickhouse.tech/docs/en/getting-started/example-datasets/metrica/), obfuscated with `clickhouse-obfuscator` as described [here](https://habr.com/ru/company/yandex/blog/485096/). +How do we know it is fast, in the first place? We do a lot of benchmarks, many kinds of them. The most basic kind of a benchmark is a micro-benchmark, that doesn't use the full code of the server and tests a particular algorithm in isolation. We use them to choose a better inner loop for some aggregate function, or to test various layouts of hash tables, and so on. For example, when we discovered that a competing database engine completes a query with `sum` aggregate function twice as fast, we tested a couple of dozen implementations of `sum` to ultimately find the one that gives the best performance (see [a talk](https://www.youtube.com/watch?v=MJJfWoWJq0o) about this, in Russian). But testing a particular algorithm by itself is not enough to say how the entire query is going to work. We have to also make end-to-end measurements of entire queries, often using the real production data, because the particulars of the data (e.g. the cardinality and the distribution of values) heavily influence the performance. Currently we have about 3000 end-to-end test queries organized into about 200 [tests](https://github.com/ClickHouse/ClickHouse/tree/6c4c3df96e41425185beb0c471a8dde0ce6f25a7/tests/performance). Many of them use real data sets, such as the [production data of Yandex.Metrica](/docs/en/getting-started/example-datasets/metrica/), obfuscated with `clickhouse-obfuscator` as described [here](https://habr.com/ru/company/yandex/blog/485096/). Micro-benchmarks are normally ran by a developer while working on the code, but it is not practical to manually run the entire battery of the end-to-end tests for each change. We use an automated system that does this for each pull request as part of continuous integration checks. It measures whether the code changes introduced by a pull request influenced the performance, for which kinds of queries and by how much, and alerts the developer if there is a regression. Here is how a typical report looks. @@ -27,6 +27,7 @@ For complex processes which resist modeling, a practical option is to use the hi We run the reference version of the server process and the tested version, simultaneously on the same machine, and run the test queries on each of them in turn, one by one. This way we eliminate most systematic errors, because both servers are equally influenced by them. We can then compare the set of results we got from the reference server process, and the set from the test server process, to see whether they look the same. Comparing the distributions using two samples is a very interesting problem in itself. We use a non-parametric bootstrap method to build a randomization distribution for the observed difference of median query run times. This method is described in detail in [[1]](#ref1), where they apply it to see how changing a fertilizer mixture changes the yield of tomato plants. ClickHouse is not much different from tomatoes, only we have to check how the changes in code influence the performance. This method ultimately gives a single threshold number _T_: what is the largest difference in median query run times between old and new server, that we can observe even if nothing has changed. Then we have a simple decision protocol given this threshold _T_ and the measured difference of medians _D_: + 1. _abs(D) <= T_ — the changes are not statistically significant, 2. _abs(D) <= 5%_ — the changes are too small to be important, 3. _abs(T) >= 10%_ — the test query has excessive run time variance that leads to poor sensitivity, @@ -36,7 +37,7 @@ The most interesting case are the unstable queries _(3)_. When the elapsed time # Understanding the Reasons Behind the Changes -An investigation of code performance often starts with applying a profiler. On Linux, you would use `perf`, a sampling profiler that periodically collects the stack trace of the process, so that you can then see an aggregate picture of where your program spends the most time. In ClickHouse, we actually have a built-in sampling profiler that saves results into a system table, so no external tools are needed. It can be enabled for all queries or for a particular one, by passing the settings [as described in the docs](https://clickhouse.tech/docs/en/operations/optimizing-performance/sampling-query-profiler/). It is on by default, so if you use a recent version of ClickHouse, you already have a combined profile of your production server load. To visualize it, we can use a well-known script for building [flamegraphs](https://github.com/brendangregg/FlameGraph): +An investigation of code performance often starts with applying a profiler. On Linux, you would use `perf`, a sampling profiler that periodically collects the stack trace of the process, so that you can then see an aggregate picture of where your program spends the most time. In ClickHouse, we actually have a built-in sampling profiler that saves results into a system table, so no external tools are needed. It can be enabled for all queries or for a particular one, by passing the settings [as described in the docs](/docs/en/operations/optimizing-performance/sampling-query-profiler/). It is on by default, so if you use a recent version of ClickHouse, you already have a combined profile of your production server load. To visualize it, we can use a well-known script for building [flamegraphs](https://github.com/brendangregg/FlameGraph): ``` clickhouse-client -q "SELECT arrayStringConcat( @@ -72,7 +73,7 @@ Regardless of how it works inside, a test system must be actually usable as a pa Organizationally, it is hard to prevent devolving into a system that does a lot of busywork to just show a green check without giving any insight. I like to call this process "mining the green check", by analogy to cryptocurrencies. Our previous system did just that. It used increasingly complex heuristics tailored to each test query to prevent false positives, restarted itself many times if the results didn't look good, and so on. Ultimately, it wasted a lot of processing power without giving the real picture of the server performance. If you wanted to be sure that the performance did or did not change, you had to recheck by hand. This sorry state is the result of how the incentives are aligned around development — most of the time, the developers just want to merge their pull requests and not be bothered by some obscure test failures. Writing a good performance test query is also not always simple. Just any other query won't do — it has to give predictable performance, be not too fast and not too slow, actually measure something, and so on. After gathering more precise statistics, we discovered that several hundred of our test queries don't measure anything meaningful, e.g. they give a result that varies by 100% between runs. Another problem is that the performance often changes in statistically significant ways (true positive) with no relevant code changes (due to e.g. random differences in layout of the executable). Given all these difficulties, a working performance test system is bound to add noticeable friction to the development process. Most of the "obvious" ways to remove this friction ultimately boil down to "mining the green check". -Implementation-wise, our system is peculiar in that it doesn't rely on well-known statistical packages, but instead heavily uses `clickhouse-local`, a tool that turns the ClickHouse SQL query processor into a [command line utility](https://altinity.com/blog/2019/6/11/clickhouse-local-the-power-of-clickhouse-sql-in-a-single-command). Doing all the computations in ClickHouse SQL helped us find bugs and usability problems with `clickhouse-local`. The performance test continues to work in dual purpose as a heavy SQL test, and sometimes catches newly introduced bugs in complex joins and the like. The query profiler is always on in the performance tests, and this finds bugs in our fork of `libunwind`. To run the test queries, we use a third-party [Python driver](https://github.com/mymarilyn/clickhouse-driver). This is the only use of this driver in our CI, and it also helped us find some bugs in native protocol handling. A not so honorable fact is that the scaffolding consists of an unreasonable amount of bash, but this at least served to convince us that running [shellcheck](https://github.com/koalaman/shellcheck) in CI is very helpful. +Implementation-wise, our system is peculiar in that it doesn't rely on well-known statistical packages, but instead heavily uses `clickhouse-local`, a tool that turns the ClickHouse SQL query processor into a command line utility Doing all the computations in ClickHouse SQL helped us find bugs and usability problems with `clickhouse-local`. The performance test continues to work in dual purpose as a heavy SQL test, and sometimes catches newly introduced bugs in complex joins and the like. The query profiler is always on in the performance tests, and this finds bugs in our fork of `libunwind`. To run the test queries, we use a third-party [Python driver](https://github.com/mymarilyn/clickhouse-driver). This is the only use of this driver in our CI, and it also helped us find some bugs in native protocol handling. A not so honorable fact is that the scaffolding consists of an unreasonable amount of bash, but this at least served to convince us that running [shellcheck](https://github.com/koalaman/shellcheck) in CI is very helpful. This concludes the overview of the ClickHouse performance test system. Stay tuned for the next article where we will discuss the most problematic kind of a performance test failure — the unstable query run time. @@ -80,4 +81,4 @@ _2021-08-20 [Alexander Kuzmenkov](https://github.com/akuzm). Title photo by [Ale References: -1. Box, Hunter, Hunter, 2005. Statistics for experimenters, p. 78: A Randomized Design Used in the Comparison of Standard and Modified Fertilizer Mixtures for Tomato Plants. \ No newline at end of file +1. Box, Hunter, Hunter, 2005. Statistics for experimenters, p. 78: A Randomized Design Used in the Comparison of Standard and Modified Fertilizer Mixtures for Tomato Plants. diff --git a/website/blog/en/2021/reading-from-external-memory.md b/website/blog/en/2021/reading-from-external-memory.md index f431bde0625..0b19d209c68 100644 --- a/website/blog/en/2021/reading-from-external-memory.md +++ b/website/blog/en/2021/reading-from-external-memory.md @@ -49,6 +49,7 @@ This figure shows results for Intel Optane SSD. Minimal latency is 12 microseconds whih is 10 times lower than those of NVMe SSD. Average latency is 1000 lower than those of HDD. There is quite large variation for small block read latency: even though the average time is quite low and close to minimal latency the maximum latency and even 99 percentile are significantly worse. If somebody looks at these results and wishes to create an Intel Optane-based service with 12 microsecond latency for reads they would have to install larger number of Intel Optane drives or consider providing more realistic timings. When latency is so small overheads of context switching and interrupt handling become noticeable. One can use polling mode to gain some improvement. In this mode the Linux kernel monitors the completion queue instead of switching to some other job and relying on hardware interrupt with interrupt handler to notify about completion. Clearly, it is considerable to use the polling mode only when hardware response is expected to arrive fast enough. + ![Intel Optane single read latency in polling mode](https://blog-images.clickhouse.tech/en/2021/reading-from-external-memory/optane-single-hipri-read.png) The figure above shows results for reading from Intel Optane in polling mode. The polling mode is used when an application calls preadv2(2) system call with RWF\_HIGHPRI flag. Compared to usual pread(2) the polling mode lowers the maximum latency by a factor of two for block sizes up to 256 kilobytes.