Merge pull request #37790 from vdimir/doc-aspell

Spellcheck for the docs
This commit is contained in:
Mikhail f. Shiryaev 2022-06-09 12:52:31 +02:00 committed by GitHub
commit 203de0c352
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
29 changed files with 616 additions and 78 deletions

View File

@ -8,6 +8,7 @@ ARG apt_archive="http://archive.ubuntu.com"
RUN sed -i "s|http://archive.ubuntu.com|$apt_archive|g" /etc/apt/sources.list
RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes \
aspell \
curl \
git \
libxml2-utils \

View File

@ -18,6 +18,7 @@ def process_result(result_folder):
("typos", "typos_output.txt"),
("whitespaces", "whitespaces_output.txt"),
("workflows", "workflows_output.txt"),
("doc typos", "doc_spell_output.txt"),
)
for name, out_file in checks:

View File

@ -11,6 +11,8 @@ echo "Check python formatting with black" | ts
./check-black -n |& tee /test_output/black_output.txt
echo "Check typos" | ts
./check-typos |& tee /test_output/typos_output.txt
echo "Check docs spelling" | ts
./check-doc-aspell |& tee /test_output/doc_spell_output.txt
echo "Check whitespaces" | ts
./check-whitespaces -n |& tee /test_output/whitespaces_output.txt
echo "Check workflows" | ts

View File

@ -138,7 +138,7 @@ It's important to name tests correctly, so one could turn some tests subset off
| Tester flag| What should be in test name | When flag should be added |
|---|---|---|---|
| `--[no-]zookeeper`| "zookeeper" or "replica" | Test uses tables from ReplicatedMergeTree family |
| `--[no-]zookeeper`| "zookeeper" or "replica" | Test uses tables from `ReplicatedMergeTree` family |
| `--[no-]shard` | "shard" or "distributed" or "global"| Test using connections to 127.0.0.2 or similar |
| `--[no-]long` | "long" or "deadlock" or "race" | Test runs longer than 60 seconds |

View File

@ -5,7 +5,7 @@ sidebar_position: 62
# Overview of ClickHouse Architecture
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns).
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns).
Whenever possible, operations are dispatched on arrays, rather than on individual values. It is called “vectorized query execution” and it helps lower the cost of actual data processing.
> This idea is nothing new. It dates back to the `APL` (A programming language, 1957) and its descendants: `A +` (APL dialect), `J` (1990), `K` (1993), and `Q` (programming language from Kx Systems, 2003). Array programming is used in scientific data processing. Neither is this idea something new in relational databases: for example, it is used in the `VectorWise` system (also known as Actian Vector Analytic Database by Actian Corporation).
@ -149,13 +149,13 @@ The server implements several different interfaces:
- A TCP interface for the native ClickHouse client and for cross-server communication during distributed query execution.
- An interface for transferring data for replication.
Internally, it is just a primitive multithreaded server without coroutines or fibers. Since the server is not designed to process a high rate of simple queries but to process a relatively low rate of complex queries, each of them can process a vast amount of data for analytics.
Internally, it is just a primitive multithread server without coroutines or fibers. Since the server is not designed to process a high rate of simple queries but to process a relatively low rate of complex queries, each of them can process a vast amount of data for analytics.
The server initializes the `Context` class with the necessary environment for query execution: the list of available databases, users and access rights, settings, clusters, the process list, the query log, and so on. Interpreters use this environment.
We maintain full backward and forward compatibility for the server TCP protocol: old clients can talk to new servers, and new clients can talk to old servers. But we do not want to maintain it eternally, and we are removing support for old versions after about one year.
:::note
:::note
For most external applications, we recommend using the HTTP interface because it is simple and easy to use. The TCP protocol is more tightly linked to internal data structures: it uses an internal format for passing blocks of data, and it uses custom framing for compressed data. We havent released a C library for that protocol because it requires linking most of the ClickHouse codebase, which is not practical.
:::
@ -178,7 +178,7 @@ To execute queries and do side activities ClickHouse allocates threads from one
Server pool is a `Poco::ThreadPool` class instance defined in `Server::main()` method. It can have at most `max_connection` threads. Every thread is dedicated to a single active connection.
Global thread pool is `GlobalThreadPool` singleton class. To allocate thread from it `ThreadFromGlobalPool` is used. It has an interface similar to `std::thread`, but pulls thread from the global pool and does all necessary initializations. It is configured with the following settings:
Global thread pool is `GlobalThreadPool` singleton class. To allocate thread from it `ThreadFromGlobalPool` is used. It has an interface similar to `std::thread`, but pulls thread from the global pool and does all necessary initialization. It is configured with the following settings:
* `max_thread_pool_size` - limit on thread count in pool.
* `max_thread_pool_free_size` - limit on idle thread count waiting for new jobs.
* `thread_pool_queue_size` - limit on scheduled job count.
@ -189,7 +189,7 @@ IO thread pool is implemented as a plain `ThreadPool` accessible via `IOThreadPo
For periodic task execution there is `BackgroundSchedulePool` class. You can register tasks using `BackgroundSchedulePool::TaskHolder` objects and the pool ensures that no task runs two jobs at the same time. It also allows you to postpone task execution to a specific instant in the future or temporarily deactivate task. Global `Context` provides a few instances of this class for different purposes. For general purpose tasks `Context::getSchedulePool()` is used.
There are also specialized thread pools for preemptable tasks. Such `IExecutableTask` task can be split into ordered sequence of jobs, called steps. To schedule these tasks in a manner allowing short tasks to be prioritied over long ones `MergeTreeBackgroundExecutor` is used. As name suggests it is used for background MergeTree related operations such as merges, mutations, fetches and moves. Pool instances are available using `Context::getCommonExecutor()` and other similar methods.
There are also specialized thread pools for preemptable tasks. Such `IExecutableTask` task can be split into ordered sequence of jobs, called steps. To schedule these tasks in a manner allowing short tasks to be prioritized over long ones `MergeTreeBackgroundExecutor` is used. As name suggests it is used for background MergeTree related operations such as merges, mutations, fetches and moves. Pool instances are available using `Context::getCommonExecutor()` and other similar methods.
No matter what pool is used for a job, at start `ThreadStatus` instance is created for this job. It encapsulates all per-thread information: thread id, query id, performance counters, resource consumption and many other useful data. Job can access it via thread local pointer by `CurrentThread::get()` call, so we do not need to pass it to every function.
@ -201,7 +201,7 @@ Servers in a cluster setup are mostly independent. You can create a `Distributed
Things become more complicated when you have subqueries in IN or JOIN clauses, and each of them uses a `Distributed` table. We have different strategies for the execution of these queries.
There is no global query plan for distributed query execution. Each node has its local query plan for its part of the job. We only have simple one-pass distributed query execution: we send queries for remote nodes and then merge the results. But this is not feasible for complicated queries with high cardinality GROUP BYs or with a large amount of temporary data for JOIN. In such cases, we need to “reshuffle” data between servers, which requires additional coordination. ClickHouse does not support that kind of query execution, and we need to work on it.
There is no global query plan for distributed query execution. Each node has its local query plan for its part of the job. We only have simple one-pass distributed query execution: we send queries for remote nodes and then merge the results. But this is not feasible for complicated queries with high cardinality `GROUP BY`s or with a large amount of temporary data for JOIN. In such cases, we need to “reshuffle” data between servers, which requires additional coordination. ClickHouse does not support that kind of query execution, and we need to work on it.
## Merge Tree {#merge-tree}
@ -231,7 +231,7 @@ Replication is physical: only compressed parts are transferred between nodes, no
Besides, each replica stores its state in ZooKeeper as the set of parts and its checksums. When the state on the local filesystem diverges from the reference state in ZooKeeper, the replica restores its consistency by downloading missing and broken parts from other replicas. When there is some unexpected or broken data in the local filesystem, ClickHouse does not remove it, but moves it to a separate directory and forgets it.
:::note
:::note
The ClickHouse cluster consists of independent shards, and each shard consists of replicas. The cluster is **not elastic**, so after adding a new shard, data is not rebalanced between shards automatically. Instead, the cluster load is supposed to be adjusted to be uneven. This implementation gives you more control, and it is ok for relatively small clusters, such as tens of nodes. But for clusters with hundreds of nodes that we are using in production, this approach becomes a significant drawback. We should implement a table engine that spans across the cluster with dynamically replicated regions that could be split and balanced between clusters automatically.
:::

View File

@ -4,7 +4,7 @@ sidebar_label: Build on Mac OS X
description: How to build ClickHouse on Mac OS X
---
# How to Build ClickHouse on Mac OS X
# How to Build ClickHouse on Mac OS X
:::info You don't have to build ClickHouse yourself!
You can install pre-built ClickHouse as described in [Quick Start](https://clickhouse.com/#quick-start). Follow **macOS (Intel)** or **macOS (Apple silicon)** installation instructions.
@ -20,9 +20,9 @@ It is also possible to compile with Apple's XCode `apple-clang` or Homebrew's `g
First install [Homebrew](https://brew.sh/)
## For Apple's Clang (discouraged): Install Xcode and Command Line Tools {#install-xcode-and-command-line-tools}
## For Apple's Clang (discouraged): Install XCode and Command Line Tools {#install-xcode-and-command-line-tools}
Install the latest [Xcode](https://apps.apple.com/am/app/xcode/id497799835?mt=12) from App Store.
Install the latest [XCode](https://apps.apple.com/am/app/xcode/id497799835?mt=12) from App Store.
Open it at least once to accept the end-user license agreement and automatically install the required components.
@ -62,7 +62,7 @@ cmake --build build
# The resulting binary will be created at: build/programs/clickhouse
```
To build using Xcode's native AppleClang compiler in Xcode IDE (this option is only for development builds and workflows, and is **not recommended** unless you know what you are doing):
To build using XCode native AppleClang compiler in XCode IDE (this option is only for development builds and workflows, and is **not recommended** unless you know what you are doing):
``` bash
cd ClickHouse
@ -71,7 +71,7 @@ mkdir build
cd build
XCODE_IDE=1 ALLOW_APPLECLANG=1 cmake -G Xcode -DCMAKE_BUILD_TYPE=Debug -DENABLE_JEMALLOC=OFF ..
cmake --open .
# ...then, in Xcode IDE select ALL_BUILD scheme and start the building process.
# ...then, in XCode IDE select ALL_BUILD scheme and start the building process.
# The resulting binary will be created at: ./programs/Debug/clickhouse
```
@ -91,9 +91,9 @@ cmake --build build
## Caveats {#caveats}
If you intend to run `clickhouse-server`, make sure to increase the systems maxfiles variable.
If you intend to run `clickhouse-server`, make sure to increase the systems `maxfiles` variable.
:::note
:::note
Youll need to use sudo.
:::

View File

@ -130,7 +130,7 @@ Here is an example of how to install the new `cmake` from the official website:
```
wget https://github.com/Kitware/CMake/releases/download/v3.22.2/cmake-3.22.2-linux-x86_64.sh
chmod +x cmake-3.22.2-linux-x86_64.sh
./cmake-3.22.2-linux-x86_64.sh
./cmake-3.22.2-linux-x86_64.sh
export PATH=/home/milovidov/work/cmake-3.22.2-linux-x86_64/bin/:${PATH}
hash cmake
```
@ -163,7 +163,7 @@ ClickHouse is available in pre-built binaries and packages. Binaries are portabl
They are built for stable, prestable and testing releases as long as for every commit to master and for every pull request.
To find the freshest build from `master`, go to [commits page](https://github.com/ClickHouse/ClickHouse/commits/master), click on the first green checkmark or red cross near commit, and click to the “Details” link right after “ClickHouse Build Check”.
To find the freshest build from `master`, go to [commits page](https://github.com/ClickHouse/ClickHouse/commits/master), click on the first green check mark or red cross near commit, and click to the “Details” link right after “ClickHouse Build Check”.
## Faster builds for development: Split build configuration {#split-build}

View File

@ -19,7 +19,7 @@ cmake .. \
## CMake files types
1. ClickHouse's source CMake files (located in the root directory and in /src).
1. ClickHouse source CMake files (located in the root directory and in /src).
2. Arch-dependent CMake files (located in /cmake/*os_name*).
3. Libraries finders (search for contrib libraries, located in /contrib/*/CMakeLists.txt).
4. Contrib build CMake files (used instead of libraries' own CMake files, located in /cmake/modules)
@ -456,7 +456,7 @@ option(ENABLE_TESTS "Provide unit_test_dbms target with Google.test unit tests"
#### If the option's state could produce unwanted (or unusual) result, explicitly warn the user.
Suppose you have an option that may strip debug symbols from the ClickHouse's part.
Suppose you have an option that may strip debug symbols from the ClickHouse part.
This can speed up the linking process, but produces a binary that cannot be debugged.
In that case, prefer explicitly raising a warning telling the developer that he may be doing something wrong.
Also, such options should be disabled if applies.

View File

@ -31,7 +31,7 @@ If you are not sure what to do, ask a maintainer for help.
## Merge With Master
Verifies that the PR can be merged to master. If not, it will fail with the
message 'Cannot fetch mergecommit'. To fix this check, resolve the conflict as
message `Cannot fetch mergecommit`. To fix this check, resolve the conflict as
described in the [GitHub
documentation](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/resolving-a-merge-conflict-on-github),
or merge the `master` branch to your pull request branch using git.
@ -57,7 +57,7 @@ You have to specify a changelog category for your change (e.g., Bug Fix), and
write a user-readable message describing the change for [CHANGELOG.md](../whats-new/changelog/)
## Push To Dockerhub
## Push To DockerHub
Builds docker images used for build and tests, then pushes them to DockerHub.
@ -118,7 +118,7 @@ Builds ClickHouse in various configurations for use in further steps. You have t
- **Compiler**: `gcc-9` or `clang-10` (or `clang-10-xx` for other architectures e.g. `clang-10-freebsd`).
- **Build type**: `Debug` or `RelWithDebInfo` (cmake).
- **Sanitizer**: `none` (without sanitizers), `address` (ASan), `memory` (MSan), `undefined` (UBSan), or `thread` (TSan).
- **Splitted** `splitted` is a [split build](../development/build.md#split-build)
- **Split** `splitted` is a [split build](../development/build.md#split-build)
- **Status**: `success` or `fail`
- **Build log**: link to the building and files copying log, useful when build failed.
- **Build time**.

View File

@ -96,9 +96,9 @@ SELECT library_name, license_type, license_path FROM system.licenses ORDER BY li
## Adding new third-party libraries and maintaining patches in third-party libraries {#adding-third-party-libraries}
1. Each third-party libary must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository. Avoid dumps/copies of external code, instead use Git's submodule feature to pull third-party code from an external upstream repository.
2. Submodules are listed in `.gitmodule`. If the external library can be used as-is, you may reference the upstream repository directly. Otherwise, i.e. the external libary requires patching/customization, create a fork of the official repository in the [Clickhouse organization in GitHub](https://github.com/ClickHouse).
1. Each third-party library must reside in a dedicated directory under the `contrib/` directory of the ClickHouse repository. Avoid dumps/copies of external code, instead use Git submodule feature to pull third-party code from an external upstream repository.
2. Submodules are listed in `.gitmodule`. If the external library can be used as-is, you may reference the upstream repository directly. Otherwise, i.e. the external library requires patching/customization, create a fork of the official repository in the [Clickhouse organization in GitHub](https://github.com/ClickHouse).
3. In the latter case, create a branch with `clickhouse/` prefix from the branch you want to integrate, e.g. `clickhouse/master` (for `master`) or `clickhouse/release/vX.Y.Z` (for a `release/vX.Y.Z` tag). The purpose of this branch is to isolate customization of the library from upstream work. For example, pulls from the upstream repository into the fork will leave all `clickhouse/` branches unaffected. Submodules in `contrib/` must only track `clickhouse/` branches of forked third-party repositories.
4. To patch a fork of a third-party library, create a dedicated branch with `clickhouse/` prefix in the fork, e.g. `clickhouse/fix-some-desaster`. Finally, merge the patch branch into the custom tracking branch (e.g. `clickhouse/master` or `clickhouse/release/vX.Y.Z`) using a PR.
5. Always create patches of third-party libraries with the official repository in mind. Once a PR of a patch branch to the `clickhouse/` branch in the fork repository is done and the submodule version in ClickHouse's official repository is bumped, consider opening another PR from the patch branch to the upstream library repository. This ensures, that 1) the contribution has more than a single use case and importance, 2) others will also benefit from it, 3) the change will not remain a maintenance burden solely on ClickHouse developers.
5. Always create patches of third-party libraries with the official repository in mind. Once a PR of a patch branch to the `clickhouse/` branch in the fork repository is done and the submodule version in ClickHouse official repository is bumped, consider opening another PR from the patch branch to the upstream library repository. This ensures, that 1) the contribution has more than a single use case and importance, 2) others will also benefit from it, 3) the change will not remain a maintenance burden solely on ClickHouse developers.
9. To update a submodule with changes in the upstream repository, first merge upstream `master` (or a new `versionX.Y.Z` tag) into the `clickhouse`-tracking branch in the fork repository. Conflicts with patches/customization will need to be resolved in this merge (see Step 4.). Once the merge is done, bump the submodule in ClickHouse to point to the new hash in the fork.

View File

@ -70,7 +70,7 @@ You can also clone the repository via https protocol:
This, however, will not let you send your changes to the server. You can still use it temporarily and add the SSH keys later replacing the remote address of the repository with `git remote` command.
You can also add original ClickHouse repos address to your local repository to pull updates from there:
You can also add original ClickHouse repo address to your local repository to pull updates from there:
git remote add upstream git@github.com:ClickHouse/ClickHouse.git
@ -177,7 +177,7 @@ If you require to build all the binaries (utilities and tests), you should run n
Full build requires about 30GB of free disk space or 15GB to build the main binaries.
When a large amount of RAM is available on build machine you should limit the number of build tasks run in parallel with `-j` param:
When a large amount of RAM is available on build machine you should limit the number of build tasks run in parallel with `-j` parameter:
ninja -j 1 clickhouse-server clickhouse-client
@ -269,7 +269,7 @@ Developing ClickHouse often requires loading realistic datasets. It is particula
Navigate to your fork repository in GitHubs UI. If you have been developing in a branch, you need to select that branch. There will be a “Pull request” button located on the screen. In essence, this means “create a request for accepting my changes into the main repository”.
A pull request can be created even if the work is not completed yet. In this case please put the word “WIP” (work in progress) at the beginning of the title, it can be changed later. This is useful for cooperative reviewing and discussion of changes as well as for running all of the available tests. It is important that you provide a brief description of your changes, it will later be used for generating release changelogs.
A pull request can be created even if the work is not completed yet. In this case please put the word “WIP” (work in progress) at the beginning of the title, it can be changed later. This is useful for cooperative reviewing and discussion of changes as well as for running all of the available tests. It is important that you provide a brief description of your changes, it will later be used for generating release changelog.
Testing will commence as soon as ClickHouse employees label your PR with a tag “can be tested”. The results of some first checks (e.g. code style) will come in within several minutes. Build check results will arrive within half an hour. And the main set of tests will report itself within an hour.

View File

@ -2,7 +2,7 @@
Rust library integration will be described based on BLAKE3 hash-function integration.
The first step is forking a library and making neccessary changes for Rust and C/C++ compatibility.
The first step is forking a library and making necessary changes for Rust and C/C++ compatibility.
After forking library repository you need to change target settings in Cargo.toml file. Firstly, you need to switch build to static library. Secondly, you need to add cbindgen crate to the crate list. We will use it later to generate C-header automatically.
@ -51,9 +51,9 @@ pub unsafe extern "C" fn blake3_apply_shim(
}
```
This method gets C-compatible string, its size and output string pointer as input. Then, it converts C-compatible inputs into types that are used by actual library methods and calls them. After that, it should convert library methods' outputs back into C-compatible type. In that particular case library supported direct writing into pointer by method fill(), so the convertion was not needed. The main advice here is to create less methods, so you will need to do less convertions on each method call and won't create much overhead.
This method gets C-compatible string, its size and output string pointer as input. Then, it converts C-compatible inputs into types that are used by actual library methods and calls them. After that, it should convert library methods' outputs back into C-compatible type. In that particular case library supported direct writing into pointer by method fill(), so the conversion was not needed. The main advice here is to create less methods, so you will need to do less conversions on each method call and won't create much overhead.
Also, you should use attribute #[no_mangle] and extern "C" for every C-compatible attribute. Without it library can compile incorrectly and cbindgen won't launch header autogeneration.
Also, you should use attribute #[no_mangle] and `extern "C"` for every C-compatible attribute. Without it library can compile incorrectly and cbindgen won't launch header autogeneration.
After all these steps you can test your library in a small project to find all problems with compatibility or header generation. If any problems occur during header generation, you can try to configure it with cbindgen.toml file (you can find an example of it in BLAKE3 directory or a template here: [https://github.com/eqrion/cbindgen/blob/master/template.toml](https://github.com/eqrion/cbindgen/blob/master/template.toml)). If everything works correctly, you can finally integrate its methods into ClickHouse.

View File

@ -4,7 +4,7 @@ sidebar_label: C++ Guide
description: A list of recommendations regarding coding style, naming convention, formatting and more
---
# How to Write C++ Code
# How to Write C++ Code
## General Recommendations {#general-recommendations}
@ -196,7 +196,7 @@ std::cerr << static_cast<int>(c) << std::endl;
The same is true for small methods in any classes or structs.
For templated classes and structs, do not separate the method declarations from the implementation (because otherwise they must be defined in the same translation unit).
For template classes and structs, do not separate the method declarations from the implementation (because otherwise they must be defined in the same translation unit).
**31.** You can wrap lines at 140 characters, instead of 80.
@ -285,7 +285,7 @@ Note: You can use Doxygen to generate documentation from these comments. But Dox
/// WHAT THE FAIL???
```
**14.** Do not use comments to make delimeters.
**14.** Do not use comments to make delimiters.
``` cpp
///******************************************************
@ -491,7 +491,7 @@ if (0 != close(fd))
throwFromErrno("Cannot close file " + file_name, ErrorCodes::CANNOT_CLOSE_FILE);
```
You can use assert to check invariants in code.
You can use assert to check invariant in code.
**4.** Exception types.
@ -552,9 +552,9 @@ Do not try to implement lock-free data structures unless it is your primary area
In most cases, prefer references.
**10.** const.
**10.** `const`.
Use constant references, pointers to constants, `const_iterator`, and const methods.
Use constant references, pointers to constants, `const_iterator`, and `const` methods.
Consider `const` to be default and use non-`const` only when necessary.
@ -596,7 +596,7 @@ public:
AggregateFunctionPtr get(const String & name, const DataTypes & argument_types) const;
```
**15.** namespace.
**15.** `namespace`.
There is no need to use a separate `namespace` for application code.
@ -606,7 +606,7 @@ For medium to large libraries, put everything in a `namespace`.
In the librarys `.h` file, you can use `namespace detail` to hide implementation details not needed for the application code.
In a `.cpp` file, you can use a `static` or anonymous namespace to hide symbols.
In a `.cpp` file, you can use a `static` or anonymous `namespace` to hide symbols.
Also, a `namespace` can be used for an `enum` to prevent the corresponding names from falling into an external `namespace` (but its better to use an `enum class`).

View File

@ -4,7 +4,7 @@ sidebar_label: Testing
description: Most of ClickHouse features can be tested with functional tests and they are mandatory to use for every change in ClickHouse code that can be tested that way.
---
# ClickHouse Testing
# ClickHouse Testing
## Functional Tests
@ -85,7 +85,7 @@ Performance tests allow to measure and compare performance of some isolated part
Each test run one or multiple queries (possibly with combinations of parameters) in a loop.
If you want to improve performance of ClickHouse in some scenario, and if improvements can be observed on simple queries, it is highly recommended to write a performance test. It always makes sense to use `perf top` or other perf tools during your tests.
If you want to improve performance of ClickHouse in some scenario, and if improvements can be observed on simple queries, it is highly recommended to write a performance test. It always makes sense to use `perf top` or other `perf` tools during your tests.
## Test Tools and Scripts {#test-tools-and-scripts}
@ -228,7 +228,7 @@ Our Security Team did some basic overview of ClickHouse capabilities from the se
We run `clang-tidy` on per-commit basis. `clang-static-analyzer` checks are also enabled. `clang-tidy` is also used for some style checks.
We have evaluated `clang-tidy`, `Coverity`, `cppcheck`, `PVS-Studio`, `tscancode`, `CodeQL`. You will find instructions for usage in `tests/instructions/` directory.
We have evaluated `clang-tidy`, `Coverity`, `cppcheck`, `PVS-Studio`, `tscancode`, `CodeQL`. You will find instructions for usage in `tests/instructions/` directory.
If you use `CLion` as an IDE, you can leverage some `clang-tidy` checks out of the box.
@ -244,7 +244,7 @@ In debug build we also involve a customization of libc that ensures that no "har
Debug assertions are used extensively.
In debug build, if exception with "logical error" code (implies a bug) is being thrown, the program is terminated prematurally. It allows to use exceptions in release build but make it an assertion in debug build.
In debug build, if exception with "logical error" code (implies a bug) is being thrown, the program is terminated prematurely. It allows to use exceptions in release build but make it an assertion in debug build.
Debug version of jemalloc is used for debug builds.
Debug version of libc++ is used for debug builds.
@ -253,7 +253,7 @@ Debug version of libc++ is used for debug builds.
Data stored on disk is checksummed. Data in MergeTree tables is checksummed in three ways simultaneously* (compressed data blocks, uncompressed data blocks, the total checksum across blocks). Data transferred over network between client and server or between servers is also checksummed. Replication ensures bit-identical data on replicas.
It is required to protect from faulty hardware (bit rot on storage media, bit flips in RAM on server, bit flips in RAM of network controller, bit flips in RAM of network switch, bit flips in RAM of client, bit flips on the wire). Note that bit flips are common and likely to occur even for ECC RAM and in presense of TCP checksums (if you manage to run thousands of servers processing petabytes of data each day). [See the video (russian)](https://www.youtube.com/watch?v=ooBAQIe0KlQ).
It is required to protect from faulty hardware (bit rot on storage media, bit flips in RAM on server, bit flips in RAM of network controller, bit flips in RAM of network switch, bit flips in RAM of client, bit flips on the wire). Note that bit flips are common and likely to occur even for ECC RAM and in presence of TCP checksums (if you manage to run thousands of servers processing petabytes of data each day). [See the video (russian)](https://www.youtube.com/watch?v=ooBAQIe0KlQ).
ClickHouse provides diagnostics that will help ops engineers to find faulty hardware.

View File

@ -12,7 +12,7 @@ The table engine (type of table) determines:
- Which queries are supported, and how.
- Concurrent data access.
- Use of indexes, if present.
- Whether multithreaded request execution is possible.
- Whether multithread request execution is possible.
- Data replication parameters.
## Engine Families {#engine-families}

View File

@ -190,8 +190,7 @@ sudo ./clickhouse install
### From Precompiled Binaries for Non-Standard Environments {#from-binaries-non-linux}
For non-Linux operating systems and for AArch64 CPU arhitecture, ClickHouse builds are provided as a cross-compiled binary from the latest commit of the `master` branch (with a few hours delay).
For non-Linux operating systems and for AArch64 CPU architecture, ClickHouse builds are provided as a cross-compiled binary from the latest commit of the `master` branch (with a few hours delay).
- [MacOS x86_64](https://builds.clickhouse.com/master/macos/clickhouse)
```bash

View File

@ -119,7 +119,7 @@ Dates with times are written in the format `YYYY-MM-DD hh:mm:ss` and parsed in t
This all occurs in the system time zone at the time the client or server starts (depending on which of them formats data). For dates with times, daylight saving time is not specified. So if a dump has times during daylight saving time, the dump does not unequivocally match the data, and parsing will select one of the two times.
During a read operation, incorrect dates and dates with times can be parsed with natural overflow or as null dates and times, without an error message.
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats YYYY-MM-DD hh:mm:ss and NNNNNNNNNN are differentiated automatically.
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats `YYYY-MM-DD hh:mm:ss` and `NNNNNNNNNN` are differentiated automatically.
Strings are output with backslash-escaped special characters. The following escape sequences are used for output: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\'`, `\\`. Parsing also supports the sequences `\a`, `\v`, and `\xHH` (hex escape sequences) and any `\c` sequences, where `c` is any character (these sequences are converted to `c`). Thus, reading data supports formats where a line feed can be written as `\n` or `\`, or as a line feed. For example, the string `Hello world` with a line feed between the words instead of space can be parsed in any of the following variations:
@ -816,7 +816,7 @@ Columns that are not present in the block will be filled with default values (yo
## JSONEachRow {#jsoneachrow}
In this format, CliskHouse outputs each row as a separated, newline-delimited JSON Object.
In this format, ClickHouse outputs each row as a separated, newline-delimited JSON Object.
Example:
@ -1363,9 +1363,9 @@ Columns `name` ([String](../sql-reference/data-types/string.md)) and `value` (nu
Rows may optionally contain `help` ([String](../sql-reference/data-types/string.md)) and `timestamp` (number).
Column `type` ([String](../sql-reference/data-types/string.md)) is either `counter`, `gauge`, `histogram`, `summary`, `untyped` or empty.
Each metric value may also have some `labels` ([Map(String, String)](../sql-reference/data-types/map.md)).
Several consequent rows may refer to the one metric with different lables. The table should be sorted by metric name (e.g., with `ORDER BY name`).
Several consequent rows may refer to the one metric with different labels. The table should be sorted by metric name (e.g., with `ORDER BY name`).
There's special requirements for labels for `histogram` and `summary`, see [Prometheus doc](https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries) for the details. Special rules applied to row with labels `{'count':''}` and `{'sum':''}`, they'll be convered to `<metric_name>_count` and `<metric_name>_sum` respectively.
There's special requirements for labels for `histogram` and `summary`, see [Prometheus doc](https://prometheus.io/docs/instrumenting/exposition_formats/#histograms-and-summaries) for the details. Special rules applied to row with labels `{'count':''}` and `{'sum':''}`, they'll be converted to `<metric_name>_count` and `<metric_name>_sum` respectively.
**Example:**
@ -1665,7 +1665,7 @@ To exchange data with Hadoop, you can use [HDFS table engine](../engines/table-e
### Parquet format settings {#parquet-format-settings}
- [output_format_parquet_row_group_size](../operations/settings/settings.md#output_format_parquet_row_group_size) - row group size in rows while data output. Default value - `1000000`.
- [output_format_parquet_row_group_size](../operations/settings/settings.md#output_format_parquet_row_group_size) - row group size in rows while data output. Default value - `1000000`.
- [output_format_parquet_string_as_string](../operations/settings/settings.md#output_format_parquet_string_as_string) - use Parquet String type instead of Binary for String columns. Default value - `false`.
- [input_format_parquet_import_nested](../operations/settings/settings.md#input_format_parquet_import_nested) - allow inserting array of structs into [Nested](../sql-reference/data-types/nested-data-structures/nested.md) table in Parquet input format. Default value - `false`.
- [input_format_parquet_case_insensitive_column_matching](../operations/settings/settings.md#input_format_parquet_case_insensitive_column_matching) - ignore case when matching Parquet columns with ClickHouse columns. Default value - `false`.
@ -1845,7 +1845,7 @@ When working with the `Regexp` format, you can use the following settings:
- Quoted (similarly to [Values](#data-format-values))
- Raw (extracts subpatterns as a whole, no escaping rules, similarly to [TSVRaw](#tabseparatedraw))
- `format_regexp_skip_unmatched` — [UInt8](../sql-reference/data-types/int-uint.md). Defines the need to throw an exeption in case the `format_regexp` expression does not match the imported data. Can be set to `0` or `1`.
- `format_regexp_skip_unmatched` — [UInt8](../sql-reference/data-types/int-uint.md). Defines the need to throw an exception in case the `format_regexp` expression does not match the imported data. Can be set to `0` or `1`.
**Usage**

View File

@ -422,7 +422,7 @@ Now `rule` can configure `method`, `headers`, `url`, `handler`:
- `query` — use with `predefined_query_handler` type, executes query when the handler is called.
- `query_param_name` — use with `dynamic_query_handler` type, extracts and executes the value corresponding to the `query_param_name` value in HTTP request params.
- `query_param_name` — use with `dynamic_query_handler` type, extracts and executes the value corresponding to the `query_param_name` value in HTTP request parameters.
- `status` — use with `static` type, response status code.
@ -477,9 +477,9 @@ In one `predefined_query_handler` only supports one `query` of an insert type.
### dynamic_query_handler {#dynamic_query_handler}
In `dynamic_query_handler`, the query is written in the form of param of the HTTP request. The difference is that in `predefined_query_handler`, the query is written in the configuration file. You can configure `query_param_name` in `dynamic_query_handler`.
In `dynamic_query_handler`, the query is written in the form of parameter of the HTTP request. The difference is that in `predefined_query_handler`, the query is written in the configuration file. You can configure `query_param_name` in `dynamic_query_handler`.
ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the param is not passed in.
ClickHouse extracts and executes the value corresponding to the `query_param_name` value in the URL of the HTTP request. The default value of `query_param_name` is `/query` . It is an optional configuration. If there is no definition in the configuration file, the parameter is not passed in.
To experiment with this functionality, the example defines the values of [max_threads](../operations/settings/settings.md#settings-max_threads) and `max_final_threads` and `queries` whether the settings were set successfully.

View File

@ -5,7 +5,7 @@ sidebar_label: PostgreSQL Interface
# PostgreSQL Interface
ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that is not already directy supported by ClickHouse (for example, Amazon Redshift).
ClickHouse supports the PostgreSQL wire protocol, which allows you to use Postgres clients to connect to ClickHouse. In a sense, ClickHouse can pretend to be a PostgreSQL instance - allowing you to connect a PostgreSQL client application to ClickHouse that is not already directly supported by ClickHouse (for example, Amazon Redshift).
To enable the PostgreSQL wire protocol, add the [postgresql_port](../operations/server-configuration-parameters/settings#server_configuration_parameters-postgresql_port) setting to your server's configuration file. For example, you could define the port in a new XML file in your `config.d` folder:
@ -59,7 +59,7 @@ The PostgreSQL protocol currently only supports plain-text passwords.
## Using SSL
If you have SSL/TLS configured on your ClickHouse instance, then `postgresql_port` will use the same settings (the port is shared for both secure and unsecure clients).
If you have SSL/TLS configured on your ClickHouse instance, then `postgresql_port` will use the same settings (the port is shared for both secure and insecure clients).
Each client has their own method of how to connect using SSL. The following command demonstrates how to pass in the certificates and key to securely connect `psql` to ClickHouse:

View File

@ -53,7 +53,7 @@ Internal coordination settings are located in the `<keeper_server>.<coordination
- `auto_forwarding` — Allow to forward write requests from followers to the leader (default: true).
- `shutdown_timeout` — Wait to finish internal connections and shutdown (ms) (default: 5000).
- `startup_timeout` — If the server doesn't connect to other quorum participants in the specified timeout it will terminate (ms) (default: 30000).
- `four_letter_word_white_list` — White list of 4lw commands (default: "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro").
- `four_letter_word_white_list` — White list of 4lw commands (default: `conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro`).
Quorum configuration is located in the `<keeper_server>.<raft_configuration>` section and contain servers description.
@ -122,7 +122,7 @@ clickhouse keeper --config /etc/your_path_to_config/config.xml
ClickHouse Keeper also provides 4lw commands which are almost the same with Zookeeper. Each command is composed of four letters such as `mntr`, `stat` etc. There are some more interesting commands: `stat` gives some general information about the server and connected clients, while `srvr` and `cons` give extended details on server and connections respectively.
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value "conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro".
The 4lw commands has a white list configuration `four_letter_word_white_list` which has default value `conf,cons,crst,envi,ruok,srst,srvr,stat,wchc,wchs,dirs,mntr,isro`.
You can issue the commands to ClickHouse Keeper via telnet or nc, at the client port.
@ -132,7 +132,7 @@ echo mntr | nc localhost 9181
Bellow is the detailed 4lw commands:
- `ruok`: Tests if server is running in a non-error state. The server will respond with imok if it is running. Otherwise it will not respond at all. A response of "imok" does not necessarily indicate that the server has joined the quorum, just that the server process is active and bound to the specified client port. Use "stat" for details on state wrt quorum and client connection information.
- `ruok`: Tests if server is running in a non-error state. The server will respond with `imok` if it is running. Otherwise it will not respond at all. A response of `imok` does not necessarily indicate that the server has joined the quorum, just that the server process is active and bound to the specified client port. Use "stat" for details on state wrt quorum and client connection information.
```
imok
@ -330,9 +330,9 @@ E.g. for a 3-node cluster, it will continue working correctly if only 1 node cra
Cluster configuration can be dynamically configured but there are some limitations. Reconfiguration relies on Raft also
so to add/remove a node from the cluster you need to have a quorum. If you lose too many nodes in your cluster at the same time without any chance
of starting them again, Raft will stop working and not allow you to reconfigure your cluster using the convenvtional way.
of starting them again, Raft will stop working and not allow you to reconfigure your cluster using the conventional way.
Nevertheless, Clickhouse Keeper has a recovery mode which allows you to forcfully reconfigure your cluster with only 1 node.
Nevertheless, Clickhouse Keeper has a recovery mode which allows you to forcefully reconfigure your cluster with only 1 node.
This should be done only as your last resort if you cannot start your nodes again, or start a new instance on the same endpoint.
Important things to note before continuing:

View File

@ -57,7 +57,7 @@ Substitutions can also be performed from ZooKeeper. To do this, specify the attr
The `config.xml` file can specify a separate config with user settings, profiles, and quotas. The relative path to this config is set in the `users_config` element. By default, it is `users.xml`. If `users_config` is omitted, the user settings, profiles, and quotas are specified directly in `config.xml`.
Users configuration can be splitted into separate files similar to `config.xml` and `config.d/`.
Users configuration can be split into separate files similar to `config.xml` and `config.d/`.
Directory name is defined as `users_config` setting without `.xml` postfix concatenated with `.d`.
Directory `users.d` is used by default, as `users_config` defaults to `users.xml`.

View File

@ -70,7 +70,7 @@ Regardless of RAID use, always use replication for data security.
Enable NCQ with a long queue. For HDD, choose the CFQ scheduler, and for SSD, choose noop. Dont reduce the readahead setting.
For HDD, enable the write cache.
Make sure that [fstrim](https://en.wikipedia.org/wiki/Trim_(computing)) is enabled for NVME and SSD disks in your OS (usually it's implemented using a cronjob or systemd service).
Make sure that [`fstrim`](https://en.wikipedia.org/wiki/Trim_(computing)) is enabled for NVME and SSD disks in your OS (usually it's implemented using a cronjob or systemd service).
## File System {#file-system}
@ -94,7 +94,7 @@ Use at least a 10 GB network, if possible. 1 Gb will also work, but it will be m
## Huge Pages {#huge-pages}
If you are using old Linux kernel, disable transparent huge pages. It interferes with memory allocators, which leads to significant performance degradation.
If you are using old Linux kernel, disable transparent huge pages. It interferes with memory allocator, which leads to significant performance degradation.
On newer Linux kernels transparent huge pages are alright.
``` bash
@ -107,7 +107,7 @@ If you are using OpenStack, set
```
cpu_mode=host-passthrough
```
in nova.conf.
in `nova.conf`.
If you are using libvirt, set
```
@ -136,7 +136,7 @@ Do not change `minSessionTimeout` setting, large values may affect ClickHouse re
With the default settings, ZooKeeper is a time bomb:
> The ZooKeeper server wont delete files from old snapshots and logs when using the default configuration (see autopurge), and this is the responsibility of the operator.
> The ZooKeeper server wont delete files from old snapshots and logs when using the default configuration (see `autopurge`), and this is the responsibility of the operator.
This bomb must be defused.
@ -241,7 +241,7 @@ JAVA_OPTS="-Xms{{ '{{' }} cluster.get('xms','128M') {{ '}}' }} \
-XX:MaxGCPauseMillis=50"
```
Salt init:
Salt initialization:
``` text
description "zookeeper-{{ '{{' }} cluster['name'] {{ '}}' }} centralized coordination service"

View File

@ -3,7 +3,7 @@ sidebar_position: 46
sidebar_label: Troubleshooting
---
# Troubleshooting
# Troubleshooting
- [Installation](#troubleshooting-installation-errors)
- [Connecting to the server](#troubleshooting-accepts-no-connections)
@ -26,7 +26,7 @@ Possible issues:
### Server Is Not Running {#server-is-not-running}
**Check if server is runnnig**
**Check if server is running**
Command:

View File

@ -4,7 +4,7 @@ sidebar_label: H3 Indexes
# Functions for Working with H3 Indexes
[H3](https://eng.uber.com/h3/) is a geographical indexing system where Earths surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be splitted into seven even but smaller ones ("children"), and so on.
[H3](https://eng.uber.com/h3/) is a geographical indexing system where Earths surface divided into a grid of even hexagonal cells. This system is hierarchical, i. e. each hexagon on the top level ("parent") can be split into seven even but smaller ones ("children"), and so on.
The level of the hierarchy is called `resolution` and can receive a value from `0` till `15`, where `0` is the `base` level with the largest and coarsest cells.
@ -1398,4 +1398,4 @@ Result:
│ [(37.42012867767779,-122.03773496427027),(37.33755608435299,-122.090428929044)] │
└─────────────────────────────────────────────────────────────────────────────────┘
```
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->
[Original article](https://clickhouse.com/docs/en/sql-reference/functions/geo/h3) <!--hide-->

View File

@ -32,7 +32,7 @@ Integer value in the `Int8`, `Int16`, `Int32`, `Int64`, `Int128` or `Int256` dat
Functions use [rounding towards zero](https://en.wikipedia.org/wiki/Rounding#Rounding_towards_zero), meaning they truncate fractional digits of numbers.
The behavior of functions for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. Remember about [numeric convertions issues](#numeric-conversion-issues), when using the functions.
The behavior of functions for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. Remember about [numeric conversions issues](#numeric-conversion-issues), when using the functions.
**Example**
@ -131,7 +131,7 @@ Integer value in the `UInt8`, `UInt16`, `UInt32`, `UInt64` or `UInt256` data typ
Functions use [rounding towards zero](https://en.wikipedia.org/wiki/Rounding#Rounding_towards_zero), meaning they truncate fractional digits of numbers.
The behavior of functions for negative agruments and for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. If you pass a string with a negative number, for example `'-32'`, ClickHouse raises an exception. Remember about [numeric convertions issues](#numeric-conversion-issues), when using the functions.
The behavior of functions for negative agruments and for the [NaN and Inf](../../sql-reference/data-types/float.md#data_type-float-nan-inf) arguments is undefined. If you pass a string with a negative number, for example `'-32'`, ClickHouse raises an exception. Remember about [numeric conversions issues](#numeric-conversion-issues), when using the functions.
**Example**
@ -689,7 +689,7 @@ x::t
- Converted value.
:::note
:::note
If the input value does not fit the bounds of the target type, the result overflows. For example, `CAST(-1, 'UInt8')` returns `255`.
:::
@ -1433,7 +1433,7 @@ Result:
Converts a `DateTime64` to a `Int64` value with fixed sub-second precision. Input value is scaled up or down appropriately depending on it precision.
:::note
:::note
The output value is a timestamp in UTC, not in the timezone of `DateTime64`.
:::

View File

@ -0,0 +1,485 @@
personal_ws-1.1 en 484
AArch
ACLs
AMQP
ASLR
ASan
Actian
AddressSanitizer
AppleClang
ArrowStream
AvroConfluent
CCTOOLS
CLion
CMake
CMakeLists
CPUs
CSVWithNames
CSVWithNamesAndTypes
CamelCase
CapnProto
CentOS
ClickHouse
Config
Contrib
Ctrl
CustomSeparated
CustomSeparatedWithNames
CustomSeparatedWithNamesAndTypes
DBMSs
DateTime
DockerHub
Doxygen
Encodings
Enum
Eoan
FixedString
FreeBSD
Fuzzer
Fuzzers
GTest
Gb
Gcc
GoogleTest
HDDs
Heredoc
Homebrew
Homebrew's
Hostname
IPv
IntN
Integrations
JSONAsString
JSONColumns
JSONColumnsWithMetadata
JSONCompact
JSONCompactColumns
JSONCompactEachRow
JSONCompactEachRowWithNames
JSONCompactEachRowWithNamesAndTypes
JSONCompactStrings
JSONCompactStringsEachRow
JSONCompactStringsEachRowWithNames
JSONCompactStringsEachRowWithNamesAndTypes
JSONEachRow
JSONEachRowWithProgress
JSONStrings
JSONStringsEachRow
JSONStringsEachRowWithProgress
JSONs
Jaeger
Jemalloc
Jepsen
KDevelop
LGPL
LOCALTIME
LOCALTIMESTAMP
LibFuzzer
LineAsString
LowCardinality
MEMTABLE
MSan
MacOS
Memcheck
MemorySanitizer
MergeTree
MessagePack
MiB
MsgPack
Multiline
Multithreading
MySQLDump
NEKUDOTAYIM
NULLIF
NVME
NuRaft
Ok
OpenSUSE
OpenStack
OpenTelemetry
PAAMAYIM
Parsers
Postgres
Precompiled
PrettyCompact
PrettyCompactMonoBlock
PrettyCompactNoEscapes
PrettyNoEscapes
PrettySpace
PrettySpaceNoEscapes
Protobuf
ProtobufSingle
QTCreator
RBAC
RawBLOB
RedHat
RowBinary
RowBinaryWithNames
RowBinaryWithNamesAndTypes
Runtime
SATA
SERIALIZABLE
SIMD
SMALLINT
SQLSTATE
SSSE
Schemas
Stateful
Submodules
Subqueries
TSVRaw
TSan
TabSeparated
TabSeparatedRaw
TabSeparatedRawWithNames
TabSeparatedRawWithNamesAndTypes
TabSeparatedWithNames
TabSeparatedWithNamesAndTypes
TargetSpecific
TemplateIgnoreSpaces
Testflows
Tgz
Toolset
Tradeoff
UBSan
UInt
UIntN
UPDATEs
Uint
Updatable
Util
Valgrind
Vectorized
VirtualBox
Werror
Woboq
WriteBuffer
WriteBuffers
XCode
YAML
YYYY
Zipkin
ZooKeeper
ZooKeeper's
aarch
allocator
analytics
anonymized
ansi
async
autogeneration
autostart
avro
avx
aws
backoff
backticks
benchmarking
blake
blockSize
boolean
bools
boringssl
brotli
buildable
camelCase
capn
capnproto
cardinality
cassandra
cbindgen
ccache
cctz
cfg
changelog
charset
charsets
checkouting
checksummed
checksumming
checksums
cityhash
cli
clickhouse
clickstream
cmake
codebase
codec
comparising
config
configs
contrib
coroutines
cpp
cppkafka
cpu
crlf
croaring
cronjob
csv
csvwithnames
csvwithnamesandtypes
customseparated
customseparatedwithnames
customseparatedwithnamesandtypes
cyrus
datacenter
datafiles
dataset
datasets
datetime
dbms
ddl
deallocation
debian
decompressor
denormals
deserialization
deserialized
destructor
destructors
dmesg
dont
dragonbox
durations
endian
enum
fastops
fcoverage
filesystem
filesystems
flatbuffers
fmtlib
formatschema
formatter
fuzzer
fuzzers
gRPC
gcem
github
glibc
googletest
grpc
grpcio
gtest
hardlinks
hdfs
heredoc
heredocs
homebrew
http
https
hyperscan
icudata
instantiation
integrational
integrations
interserver
jdbc
jemalloc
json
jsonasstring
jsoncolumns
jsoncolumnsmonoblock
jsoncompact
jsoncompactcolumns
jsoncompacteachrow
jsoncompacteachrowwithnames
jsoncompacteachrowwithnamesandtypes
jsoncompactstrings
jsoncompactstringseachrow
jsoncompactstringseachrowwithnames
jsoncompactstringseachrowwithnamesandtypes
jsoneachrow
jsoneachrowwithprogress
jsonstrings
jsonstringseachrow
jsonstringseachrowwithprogress
kafka
kafkacat
konsole
latencies
lexicographically
libFuzzer
libc
libcpuid
libcxx
libcxxabi
libdivide
libfarmhash
libfuzzer
libgsasl
libhdfs
libmetrohash
libpq
libpqxx
librdkafka
libs
libunwind
libuv
libvirt
linearizability
linearizable
lineasstring
linefeeds
linux
llvm
localhost
macOS
mariadb
miniselect
msgpack
msgpk
multiline
multithread
murmurhash
mutex
mysql
mysqldump
mysqljs
noop
nullable
num
obfuscator
odbc
ok
openldap
opentelemetry
overcommit
parallelization
parallelize
parallelized
parsers
pclmulqdq
performant
poco
popcnt
postfix
postfixes
postgresql
pre
prebuild
prebuilt
preemptable
preloaded
preprocessed
preprocessor
presentational
prestable
prettycompact
prettycompactmonoblock
prettycompactnoescapes
prettynoescapes
prettyspace
prettyspacenoescapes
prlimit
prometheus
proto
protobuf
protobufsingle
psql
ptrs
py
rapidjson
rawblob
readahead
readline
readme
readonly
rebalanced
replxx
repo
representable
requestor
resultset
rethrow
risc
ro
rocksdb
rowNumberInBlock
rowbinary
rowbinarywithnames
rowbinarywithnamesandtypes
rsync
runningAccumulate
runtime
russian
rw
sasl
schemas
simdjson
skippingerrors
sparsehash
sql
src
stacktraces
statbox
stateful
stderr
stdin
stdout
strtod
strtoll
strtoull
structs
subdirectories
subexpressions
submodule
submodules
subpattern
subpatterns
subqueries
subquery
subseconds
substring
subtree
sudo
symlink
symlinks
syntaxes
systemd
tabseparated
tabseparatedraw
tabseparatedrawwithnames
tabseparatedrawwithnamesandtypes
tabseparatedwithnames
tabseparatedwithnamesandtypes
tcp
templateignorespaces
tgz
th
tmp
tokenization
toml
toolset
tskv
tsv
tui
turbostat
txt
unary
unencrypted
unixodbc
url
userspace
utils
uuid
variadic
varint
vectorized
wchc
wchs
webpage
webserver
wget
whitespace
whitespaces
wrt
xcode
xml
xz
zLib
zkcopy
zlib
znodes
zstd

View File

@ -0,0 +1,49 @@
#!/usr/bin/env bash
# Perform spell checking on the docs
if [[ ${1:-} == "--help" ]] || [[ ${1:-} == "-h" ]]; then
echo "Usage $0 [--help|-h] [-i]"
echo " --help|-h: print this help"
echo " -i: interactive mode"
exit 0
fi
ROOT_PATH=$(git rev-parse --show-toplevel)
CHECK_LANG=en
ASPELL_IGNORE_PATH="${ROOT_PATH}/utils/check-style/aspell-ignore/${CHECK_LANG}"
STATUS=0
for fname in ${ROOT_PATH}/docs/${CHECK_LANG}/**/*.md; do
if [[ ${1:-} == "-i" ]]; then
echo "Checking $fname"
aspell --personal=aspell-dict.txt --add-sgml-skip=code --encoding=utf-8 --mode=markdown -W 3 --lang=${CHECK_LANG} --home-dir=${ASPELL_IGNORE_PATH} -c "$fname"
continue
fi
errors=$(cat "$fname" \
| aspell list \
-W 3 \
--personal=aspell-dict.txt \
--add-sgml-skip=code \
--encoding=utf-8 \
--mode=markdown \
--lang=${CHECK_LANG} \
--home-dir=${ASPELL_IGNORE_PATH} \
| sort | uniq)
if [ ! -z "$errors" ]; then
STATUS=1
echo "====== $fname ======"
echo "$errors"
fi
done
if (( STATUS != 0 )); then
echo "====== Errors found ======"
echo "To exclude some words add them to the dictionary file \"${ASPELL_IGNORE_PATH}/aspell-dict.txt\""
echo "You can also run ${0} -i to see the errors interactively and fix them or add to the dictionary file"
fi
exit ${STATUS}

View File

@ -6,3 +6,4 @@ $dir/check-typos
$dir/check-whitespaces -n
$dir/check-duplicate-includes.sh
$dir/shellcheck-run.sh
$dir/check-doc-aspell

View File

@ -5,7 +5,7 @@
ROOT_PATH=$(git rev-parse --show-toplevel)
codespell \
--skip '*generated*,*gperf*,*.bin,*.mrk*,*.idx,checksums.txt,*.dat,*.pyc,*.kate-swp,*obfuscateQueries.cpp' \
--skip "*generated*,*gperf*,*.bin,*.mrk*,*.idx,checksums.txt,*.dat,*.pyc,*.kate-swp,*obfuscateQueries.cpp,${ROOT_PATH}/utils/check-style/aspell-ignore" \
--ignore-words "${ROOT_PATH}/utils/check-style/codespell-ignore-words.list" \
--exclude-file "${ROOT_PATH}/utils/check-style/codespell-ignore-lines.list" \
--quiet-level 2 \