* Vectorize AssociativeGenericApplierImpl::apply
This commit achieved the auto-vectorization by redefining numerical
values of ternary representations and corresponding implementations
of And/Or operators, caching the intermediate ternary values in a
continuous range of memory for the SIMD instructions to consume,
and removing the short-circuit for the ternary logic evaluation.
* Optimize TernaryValueBuilder for ColumnNullable
The numerical representation of a ColumnNullable is calculated from
the data column of any data type and the null map column of UInt8
with a bitwise operation expression, which is efficient for auto-
vectorization. However, when this expression is applied to a data
column of a type other than UInt8, the SIMD register is not fully
utilized due to the mismatch of data types, and the data throughput
regresses.
To optimize the SIMD register usage, the has_value flag is firstly
evaluated from the data column and stored in a UInt8 array. Then it
is loaded from memory before the calculation of bitwise operation
expression, so that the types of the operands are both UInt8.
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>