Merge pull request #63414 from rschu1ze/docs-update

Docs: Various minor docs updates
This commit is contained in:
Robert Schulze 2024-05-06 14:40:14 +00:00 committed by GitHub
commit a65e208892
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 37 additions and 78 deletions

View File

@ -5,22 +5,13 @@ title: How to Build, Run and Debug ClickHouse on Linux for s390x (zLinux)
sidebar_label: Build on Linux for s390x (zLinux) sidebar_label: Build on Linux for s390x (zLinux)
--- ---
As of writing (2023/3/10) building for s390x considered to be experimental. Not all features can be enabled, has broken features and is currently under active development. At the time of writing (2024 May), support for the s390x platform is considered experimental, i.e. some features are disabled or broken on s390x.
## Building ClickHouse for s390x
## Building s390x has two OpenSSL-related build options:
- By default, OpenSSL is build on s390x as a shared library. This is different from all other platforms, where OpenSSL is build as static library.
s390x has two OpenSSL-related build options. - To build OpenSSL as a static library regardless, pass `-DENABLE_OPENSSL_DYNAMIC=0` to CMake.
- By default, the s390x build will dynamically link to OpenSSL libraries. It will build OpenSSL shared objects, so it's not necessary to install OpenSSL beforehand. (This option is recommended in all cases.)
- Another option is to build OpenSSL in-tree. In this case two build flags need to be supplied to cmake
```bash
-DENABLE_OPENSSL_DYNAMIC=0
```
:::note
s390x builds are temporarily disabled in CI.
:::
These instructions assume that the host machine is x86_64 and has all the tooling required to build natively based on the [build instructions](../development/build.md). It also assumes that the host is Ubuntu 22.04 but the following instructions should also work on Ubuntu 20.04. These instructions assume that the host machine is x86_64 and has all the tooling required to build natively based on the [build instructions](../development/build.md). It also assumes that the host is Ubuntu 22.04 but the following instructions should also work on Ubuntu 20.04.
@ -31,11 +22,16 @@ apt-get install binutils-s390x-linux-gnu libc6-dev-s390x-cross gcc-s390x-linux-g
``` ```
If you wish to cross compile rust code install the rust cross compile target for s390x: If you wish to cross compile rust code install the rust cross compile target for s390x:
```bash ```bash
rustup target add s390x-unknown-linux-gnu rustup target add s390x-unknown-linux-gnu
``` ```
The s390x build uses the mold linker, download it from https://github.com/rui314/mold/releases/download/v2.0.0/mold-2.0.0-x86_64-linux.tar.gz
and place it into your `$PATH`.
To build for s390x: To build for s390x:
```bash ```bash
cmake -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-s390x.cmake .. cmake -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-s390x.cmake ..
ninja ninja

View File

@ -22,9 +22,8 @@ ORDER BY Distance(vectors, Point)
LIMIT N LIMIT N
``` ```
`vectors` contains N-dimensional values of type [Array](../../../sql-reference/data-types/array.md) or `vectors` contains N-dimensional values of type [Array(Float32)](../../../sql-reference/data-types/array.md), for example embeddings.
[Tuple](../../../sql-reference/data-types/tuple.md), for example embeddings. Function `Distance` computes the distance between two vectors. Function `Distance` computes the distance between two vectors. Often, the Euclidean (L2) distance is chosen as distance function but [other
Often, the Euclidean (L2) distance is chosen as distance function but [other
distance functions](/docs/en/sql-reference/functions/distance-functions.md) are also possible. `Point` is the reference point, e.g. `(0.17, distance functions](/docs/en/sql-reference/functions/distance-functions.md) are also possible. `Point` is the reference point, e.g. `(0.17,
0.33, ...)`, and `N` limits the number of search results. 0.33, ...)`, and `N` limits the number of search results.
@ -47,7 +46,7 @@ of the search space (using clustering, search trees, etc.) which allows to compu
# Creating and Using ANN Indexes {#creating_using_ann_indexes} # Creating and Using ANN Indexes {#creating_using_ann_indexes}
Syntax to create an ANN index over an [Array](../../../sql-reference/data-types/array.md) column: Syntax to create an ANN index over an [Array(Float32)](../../../sql-reference/data-types/array.md) column:
```sql ```sql
CREATE TABLE table_with_ann_index CREATE TABLE table_with_ann_index
@ -60,19 +59,6 @@ ENGINE = MergeTree
ORDER BY id; ORDER BY id;
``` ```
Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
```sql
CREATE TABLE table_with_ann_index
(
`id` Int64,
`vectors` Tuple(Float32[, Float32[, ...]]),
INDEX [ann_index_name] vectors TYPE [ann_index_type]([ann_index_parameters]) [GRANULARITY [N]]
)
ENGINE = MergeTree
ORDER BY id;
```
ANN indexes are built during column insertion and merge. As a result, `INSERT` and `OPTIMIZE` statements will be slower than for ordinary ANN indexes are built during column insertion and merge. As a result, `INSERT` and `OPTIMIZE` statements will be slower than for ordinary
tables. ANNIndexes are ideally used only with immutable or rarely changed data, respectively when are far more read requests than write tables. ANNIndexes are ideally used only with immutable or rarely changed data, respectively when are far more read requests than write
requests. requests.
@ -164,7 +150,7 @@ linear surfaces (lines in 2D, planes in 3D etc.).
</iframe> </iframe>
</div> </div>
Syntax to create an Annoy index over an [Array](../../../sql-reference/data-types/array.md) column: Syntax to create an Annoy index over an [Array(Float32)](../../../sql-reference/data-types/array.md) column:
```sql ```sql
CREATE TABLE table_with_annoy_index CREATE TABLE table_with_annoy_index
@ -177,19 +163,6 @@ ENGINE = MergeTree
ORDER BY id; ORDER BY id;
``` ```
Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
```sql
CREATE TABLE table_with_annoy_index
(
id Int64,
vectors Tuple(Float32[, Float32[, ...]]),
INDEX [ann_index_name] vectors TYPE annoy([Distance[, NumTrees]]) [GRANULARITY N]
)
ENGINE = MergeTree
ORDER BY id;
```
Annoy currently supports two distance functions: Annoy currently supports two distance functions:
- `L2Distance`, also called Euclidean distance, is the length of a line segment between two points in Euclidean space - `L2Distance`, also called Euclidean distance, is the length of a line segment between two points in Euclidean space
([Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)). ([Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)).
@ -203,10 +176,9 @@ Parameter `NumTrees` is the number of trees which the algorithm creates (default
more accurate search results but slower index creation / query times (approximately linearly) as well as larger index sizes. more accurate search results but slower index creation / query times (approximately linearly) as well as larger index sizes.
:::note :::note
Indexes over columns of type `Array` will generally work faster than indexes on `Tuple` columns. All arrays must have same length. To avoid All arrays must have same length. To avoid errors, you can use a
errors, you can use a [CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints), for example, `CONSTRAINT [CONSTRAINT](/docs/en/sql-reference/statements/create/table.md#constraints), for example, `CONSTRAINT constraint_name_1 CHECK
constraint_name_1 CHECK length(vectors) = 256`. Also, empty `Arrays` and unspecified `Array` values in INSERT statements (i.e. default length(vectors) = 256`. Also, empty `Arrays` and unspecified `Array` values in INSERT statements (i.e. default values) are not supported.
values) are not supported.
::: :::
The creation of Annoy indexes (whenever a new part is build, e.g. at the end of a merge) is a relatively slow process. You can increase The creation of Annoy indexes (whenever a new part is build, e.g. at the end of a merge) is a relatively slow process. You can increase
@ -264,19 +236,6 @@ ENGINE = MergeTree
ORDER BY id; ORDER BY id;
``` ```
Syntax to create an ANN index over a [Tuple](../../../sql-reference/data-types/tuple.md) column:
```sql
CREATE TABLE table_with_usearch_index
(
id Int64,
vectors Tuple(Float32[, Float32[, ...]]),
INDEX [ann_index_name] vectors TYPE usearch([Distance[, ScalarKind]]) [GRANULARITY N]
)
ENGINE = MergeTree
ORDER BY id;
```
USearch currently supports two distance functions: USearch currently supports two distance functions:
- `L2Distance`, also called Euclidean distance, is the length of a line segment between two points in Euclidean space - `L2Distance`, also called Euclidean distance, is the length of a line segment between two points in Euclidean space
([Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)). ([Wikipedia](https://en.wikipedia.org/wiki/Euclidean_distance)).

View File

@ -53,6 +53,10 @@ ENGINE = MergeTree
ORDER BY key ORDER BY key
``` ```
:::note
In earlier versions of ClickHouse, the corresponding index type name was `inverted`.
:::
where `N` specifies the tokenizer: where `N` specifies the tokenizer:
- `full_text(0)` (or shorter: `full_text()`) set the tokenizer to "tokens", i.e. split strings along spaces, - `full_text(0)` (or shorter: `full_text()`) set the tokenizer to "tokens", i.e. split strings along spaces,

View File

@ -1417,31 +1417,31 @@ toStartOfFifteenMinutes(toDateTime('2023-04-21 10:23:00')): 2023-04-21 10:15:00
This function generalizes other `toStartOf*()` functions with `toStartOfInterval(date_or_date_with_time, INTERVAL x unit [, time_zone])` syntax. This function generalizes other `toStartOf*()` functions with `toStartOfInterval(date_or_date_with_time, INTERVAL x unit [, time_zone])` syntax.
For example, For example,
- `toStartOfInterval(t, INTERVAL 1 year)` returns the same as `toStartOfYear(t)`, - `toStartOfInterval(t, INTERVAL 1 YEAR)` returns the same as `toStartOfYear(t)`,
- `toStartOfInterval(t, INTERVAL 1 month)` returns the same as `toStartOfMonth(t)`, - `toStartOfInterval(t, INTERVAL 1 MONTH)` returns the same as `toStartOfMonth(t)`,
- `toStartOfInterval(t, INTERVAL 1 day)` returns the same as `toStartOfDay(t)`, - `toStartOfInterval(t, INTERVAL 1 DAY)` returns the same as `toStartOfDay(t)`,
- `toStartOfInterval(t, INTERVAL 15 minute)` returns the same as `toStartOfFifteenMinutes(t)`. - `toStartOfInterval(t, INTERVAL 15 MINUTE)` returns the same as `toStartOfFifteenMinutes(t)`.
The calculation is performed relative to specific points in time: The calculation is performed relative to specific points in time:
| Interval | Start | | Interval | Start |
|-------------|------------------------| |-------------|------------------------|
| year | year 0 | | YEAR | year 0 |
| quarter | 1900 Q1 | | QUARTER | 1900 Q1 |
| month | 1900 January | | MONTH | 1900 January |
| week | 1970, 1st week (01-05) | | WEEK | 1970, 1st week (01-05) |
| day | 1970-01-01 | | DAY | 1970-01-01 |
| hour | (*) | | HOUR | (*) |
| minute | 1970-01-01 00:00:00 | | MINUTE | 1970-01-01 00:00:00 |
| second | 1970-01-01 00:00:00 | | SECOND | 1970-01-01 00:00:00 |
| millisecond | 1970-01-01 00:00:00 | | MILLISECOND | 1970-01-01 00:00:00 |
| microsecond | 1970-01-01 00:00:00 | | MICROSECOND | 1970-01-01 00:00:00 |
| nanosecond | 1970-01-01 00:00:00 | | NANOSECOND | 1970-01-01 00:00:00 |
(*) hour intervals are special: the calculation is always performed relative to 00:00:00 (midnight) of the current day. As a result, only (*) hour intervals are special: the calculation is always performed relative to 00:00:00 (midnight) of the current day. As a result, only
hour values between 1 and 23 are useful. hour values between 1 and 23 are useful.
If unit `week` was specified, `toStartOfInterval` assumes that weeks start on Monday. Note that this behavior is different from that of function `toStartOfWeek` in which weeks start by default on Sunday. If unit `WEEK` was specified, `toStartOfInterval` assumes that weeks start on Monday. Note that this behavior is different from that of function `toStartOfWeek` in which weeks start by default on Sunday.
**See Also** **See Also**