mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-21 23:21:59 +00:00
Fix docs and some fixups
This commit is contained in:
parent
431a917cda
commit
3974e9060a
@ -989,44 +989,43 @@ ALTER TABLE tab DROP STATISTICS a;
|
||||
These lightweight statistics aggregate information about distribution of values in columns. Statistics are stored in every part and updated when every insert comes.
|
||||
They can be used for prewhere optimization only if we enable `set allow_statistics_optimize = 1`.
|
||||
|
||||
#### Available Types of Column Statistics {#available-types-of-column-statistics}
|
||||
### Available Types of Column Statistics {#available-types-of-column-statistics}
|
||||
|
||||
- `MinMax`
|
||||
|
||||
The minimum and maximum column value which allows to estimate the selectivity of range filters on numeric columns.
|
||||
|
||||
Supported data types: (U)Int*, Float*, Decimal(*), Boolean and Date*.
|
||||
|
||||
- `TDigest`
|
||||
|
||||
[TDigest](https://github.com/tdunning/t-digest) sketches which allow to compute approximate percentiles (e.g. the 90th percentile) for numeric columns.
|
||||
|
||||
Supported data types: (U)Int*, Float*, Decimal(*), Boolean and Date*.
|
||||
|
||||
- `Uniq`
|
||||
|
||||
[HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) sketches which provide an estimation how many distinct values a column contains.
|
||||
|
||||
Supported data types: (U)Int*, Float*, Decimal*, Boolean, Date* and (Fixed)String.
|
||||
|
||||
- `count_min`
|
||||
|
||||
[Count-min](https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch) sketches which provide an approximate count of the frequency of each value in a column.
|
||||
|
||||
Supported data types: (U)Int*, Float*, Decimal*, Boolean, Date* and (Fixed)String.
|
||||
|
||||
Note that all statistics types support `LowCardinality` and `Nullable` modifiers to data types.
|
||||
### Supported Data Types {#supported-data-types}
|
||||
|
||||
#### Supported operations of Column Statistics {#supported-operations-of-column-statistics}
|
||||
| | (U)Int* | Float* | Decimal(*) | Date* | Boolean | Enum* | (Fixed)String |
|
||||
|-----------|---------|--------|------------|-------|---------|-------|------------------|
|
||||
| count_min | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
|
||||
| MinMax | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✗ |
|
||||
| TDigest | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✗ |
|
||||
| Uniq | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
|
||||
|
||||
| | Equals | Range |
|
||||
|-----------|---------|-------|
|
||||
| count_min | ✔ | ✗ |
|
||||
| MinMax | ✗ | ✔ |
|
||||
| TDigest | ✗ | ✔ |
|
||||
| Uniq | ✔ | ✗ |
|
||||
|
||||
Please note that operation `Range` represents >, >=, < or <=.
|
||||
### Supported Operations {#supported-operations}
|
||||
|
||||
| | Equality filters (==) | Range filters (>, >=, <, <=) |
|
||||
|-----------|-----------------------|------------------------------|
|
||||
| count_min | ✔ | ✗ |
|
||||
| MinMax | ✗ | ✔ |
|
||||
| TDigest | ✗ | ✔ |
|
||||
| Uniq | ✔ | ✗ |
|
||||
|
||||
|
||||
## Column-level Settings {#column-level-settings}
|
||||
|
@ -127,7 +127,7 @@ Float64 ColumnStatistics::estimateEqual(const Field & val) const
|
||||
if (stats.contains(StatisticsType::Uniq))
|
||||
{
|
||||
UInt64 cardinality = stats.at(StatisticsType::Uniq)->estimateCardinality();
|
||||
if (cardinality == 0)
|
||||
if (cardinality == 0 || rows == 0)
|
||||
return 0;
|
||||
return 1.0 / cardinality * rows; /// assume uniform distribution
|
||||
}
|
||||
|
@ -51,6 +51,9 @@ void StatisticsMinMax::deserialize(ReadBuffer & buf)
|
||||
|
||||
Float64 StatisticsMinMax::estimateLess(const Field & val) const
|
||||
{
|
||||
if (row_count == 0)
|
||||
return 0;
|
||||
|
||||
auto val_as_float = StatisticsUtils::tryConvertToFloat64(val, data_type);
|
||||
if (!val_as_float.has_value())
|
||||
return 0;
|
||||
|
Loading…
Reference in New Issue
Block a user