Fix docs and some fixups

This commit is contained in:
JackyWoo 2024-09-06 18:50:02 +08:00
parent 431a917cda
commit 3974e9060a
3 changed files with 20 additions and 18 deletions

View File

@ -989,44 +989,43 @@ ALTER TABLE tab DROP STATISTICS a;
These lightweight statistics aggregate information about distribution of values in columns. Statistics are stored in every part and updated when every insert comes.
They can be used for prewhere optimization only if we enable `set allow_statistics_optimize = 1`.
#### Available Types of Column Statistics {#available-types-of-column-statistics}
### Available Types of Column Statistics {#available-types-of-column-statistics}
- `MinMax`
The minimum and maximum column value which allows to estimate the selectivity of range filters on numeric columns.
Supported data types: (U)Int*, Float*, Decimal(*), Boolean and Date*.
- `TDigest`
[TDigest](https://github.com/tdunning/t-digest) sketches which allow to compute approximate percentiles (e.g. the 90th percentile) for numeric columns.
Supported data types: (U)Int*, Float*, Decimal(*), Boolean and Date*.
- `Uniq`
[HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) sketches which provide an estimation how many distinct values a column contains.
Supported data types: (U)Int*, Float*, Decimal*, Boolean, Date* and (Fixed)String.
- `count_min`
[Count-min](https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch) sketches which provide an approximate count of the frequency of each value in a column.
Supported data types: (U)Int*, Float*, Decimal*, Boolean, Date* and (Fixed)String.
Note that all statistics types support `LowCardinality` and `Nullable` modifiers to data types.
### Supported Data Types {#supported-data-types}
#### Supported operations of Column Statistics {#supported-operations-of-column-statistics}
| | (U)Int* | Float* | Decimal(*) | Date* | Boolean | Enum* | (Fixed)String |
|-----------|---------|--------|------------|-------|---------|-------|------------------|
| count_min | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| MinMax | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✗ |
| TDigest | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✗ |
| Uniq | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| | Equals | Range |
|-----------|---------|-------|
| count_min | ✔ | ✗ |
| MinMax | ✗ | ✔ |
| TDigest | ✗ | ✔ |
| Uniq | ✔ | ✗ |
Please note that operation `Range` represents >, >=, < or <=.
### Supported Operations {#supported-operations}
| | Equality filters (==) | Range filters (>, >=, <, <=) |
|-----------|-----------------------|------------------------------|
| count_min | ✔ | ✗ |
| MinMax | ✗ | ✔ |
| TDigest | ✗ | ✔ |
| Uniq | ✔ | ✗ |
## Column-level Settings {#column-level-settings}

View File

@ -127,7 +127,7 @@ Float64 ColumnStatistics::estimateEqual(const Field & val) const
if (stats.contains(StatisticsType::Uniq))
{
UInt64 cardinality = stats.at(StatisticsType::Uniq)->estimateCardinality();
if (cardinality == 0)
if (cardinality == 0 || rows == 0)
return 0;
return 1.0 / cardinality * rows; /// assume uniform distribution
}

View File

@ -51,6 +51,9 @@ void StatisticsMinMax::deserialize(ReadBuffer & buf)
Float64 StatisticsMinMax::estimateLess(const Field & val) const
{
if (row_count == 0)
return 0;
auto val_as_float = StatisticsUtils::tryConvertToFloat64(val, data_type);
if (!val_as_float.has_value())
return 0;