This commit is contained in:
JackyWoo 2024-08-06 09:56:38 +08:00
parent a36424fc8c
commit 4fa30da118

View File

@ -994,30 +994,36 @@ They can be used for prewhere optimization only if we enable `set allow_statisti
- `MinMax`
The minimum and maximum column value which allows to estimate the selectivity of range filters on numeric columns.
Supported data types: (U)Int*, Float*, Decimal(*), Boolean and Date*.
- `TDigest`
[TDigest](https://github.com/tdunning/t-digest) sketches which allow to compute approximate percentiles (e.g. the 90th percentile) for numeric columns.
Supported data types: (U)Int*, Float*, Decimal(*), Boolean and Date*.
- `Uniq`
[HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) sketches which provide an estimation how many distinct values a column contains.
Supported data types: (U)Int*, Float*, Decimal(*), Boolean, Date* and (Fixed)String.
- `count_min`
[Count-min](https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch) sketches which provide an approximate count of the frequency of each value in a column.
Supported data types: (U)Int*, Float*, Decimal(*), Boolean, Date* and (Fixed)String.
#### Supported Data Types {#supported-data-types}
Note that all statistics types support `LowCardinality` and `Nullable` modifiers to data types.
| Data Type / Statistics | count_min | minmax | tdigest | uniq |
|-------------------------|-----------|--------|---------|------|
| Integer types | ✔ | ✔ | ✔ | ✔ |
| Floating-point numbers | ✔ | ✔ | ✔ | ✔ |
| Boolean | ✔ | ✔ | ✔ | ✔ |
| Strings | ✔ | ✗ | ✗ | ✔ |
| Dates | ✔ | ✔ | ✔ | ✔ |
#### Supported operations of Column Statistics {#supported-operations-of-column-statistics}
| | Equals | Range |
|-----------|---------|-------|
| count_min | ✔ | ✔ |
| MinMax | ✔ | ✔ |
| TDigest | ✔ | ✔ |
| Uniq | ✔ | ✗ |
Please note that operation range represents >, >=, <, <=.
You can also add `LowCardinality` and `Nullable` modifiers to the data types.
## Column-level Settings {#column-level-settings}