Reasons:
1. The original Gorilla paper proposed a compression schema for pairs of
time stamps and double-precision FP values. ClickHouse's Gorilla
codec only implements compression of the latter and it does not
impose any data type restrictions.
- Data types != Float* or (U)Int* (e.g. Decimal, Point etc.) are
definitely not supposed to be used with Gorilla.
- (U)Int* types are debatable. The paper only considers
integers-stored-as-FP-values, a practical use case for which
Gorilla works well. Standalone integers are not considered which
makes them at least suspicious.
2. Achieve consistency with FPC, another specialized floating-point
timeseries codec, which rejects non-float data.
3. On practical datasets, ZSTD is often "good enough" (**) so it should
be okay to disincentive non-ZSTD codecs a little bit. If needed,
Delta and DoubleDelta codecs are viable alternative for slowly
changing (time-series-like) integer sequences.
Since on-prem and hosted users may still have Gorilla-compressed
non-float data, this combination is only deprecated for now. No warning
or error will be emitted. Users are encouraged to migrate
Gorilla-compressed non-float data to an alternative codec. It is planned
to treat Gorilla-compressed non-float columns as "suspicious" six months
after this commit (i.e. in v23.6). Even then, it will still be possible
to set "allow_suspicious_codecs = true" and read and write
Gorilla-compressed non-float data.
(*) Sec. 4.1.2, "Gorilla restricts the value element in its tuple to a
double floating point type.", https://doi.org/10.14778/2824032.2824078
(**) https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema
ADD and DROP are not such lightweight command they generate mutations to deal with the changes and they will take time to complete depending on how much data the table has.
The example statement for `REPLACE TABLE` using a `SELECT` query does not run as written. The example was missing a query engine, engine options, and the `AS` keyword, all of which seem required to replace a table with the output of some query.
An issue was opened as the select query was not updated after the alert of the view was executed, this is because the select gets ignored, the only way to change it is DROP/CREATE the view again.
So I want to add this information into our doc.
I have found a great page that explain more in deep the Projections, and with images explain how they work behind the scenes.
I have also added a note as it's true we do not mention that creating projections will increase the disk usage.
Implementation
* Fix for clang-today build fails - updated to use const in Context.cpp & const function in ActionsVisitior.cpp
* Updated to use QueryParameterVisitor to check if query has query parameters
* Updated executeTableFunction to check if table/table exists instead of try-catch approach
* Fixed small review comments and style comments.
Documentation:
* Addressed review comments and added the LIVE view part which was removed by mistake in the previous commits.
Implementation:
* Updated parsers by adding a bool allow_query_parameters while creating ordinary view, which is used in interpreters to allow query parameters in SELECT.
* Added a check in ActionsVisitor if multiple parameters have same names while creating parameterised view.
* Added bool in StorageView to represent parameterized view.
* Updated processing of SELECT with parameter values to check for views and added substitution of values in the query parameters.
Testing:
* Added a test tests/queries/0_stateless/02428_parameterized_view.sql
Documentation:
* Updated the english documentation for VIEW.
The links at the very beginning of
https://clickhouse.com/docs/en/sql-reference/statements/system
don't work. They are reference other sections of the same document. This
is weird because there is a small index already on the right side. I
searched our documentation and this seems to be the only pages which do
so. Therefore removing the links altogether instead of fixing them.
- This commit restores statements "SYSTEM RELOAD MODEL(S)" which provide
a mechanism to update a model explicitly. It also saves potentially
unnecessary reloads of a model from disk after it's initial load.
To keep the complexity low, the semantics of "SYSTEM RELOAD MODEL(S)
was changed from eager to lazy. This means that both statements
previously immedately reloaded the specified/all models, whereas now
the statements only trigger an unload and the first call to
catboostEvaluate() does the actual load.
- Monitoring view SYSTEM.MODELS is also restored but with some obsolete
fields removed. The view was not documented in the past and for now it
remains undocumented. The commit is thus not considered a breach of
ClickHouse's public interface.
- The deleted function modelEvaluate() was superseded by
catboostEvaluate().
- Also delete the external model repository, as modelEvaluate() was it's
last user. Additionally remove the system view SYSTEM.MODELS for
inspecting the repository.
- SYSTEM RELOAD MODELS is also obsolete. HOWEVER, it was retained and
made a no-op instead of deleted.
Why?
The reason is that RBAC in distributed setups works by storing
privileges (granted and revoked) as plain SQL statements in Keeper.
Nodes read these statements at startup and parse them. If a privilege
for SYSTEM RELOAD MODELS exists but parser doesn't recognize it
nodes would fail to come up.
Considered but rejected alternatives:
- Ignore SYSTEM RELOAD MODELS during parsing RBAC privileges and
return an error for regular SYSTEM RELOAD MODELS SQL. Special-case
of no-op behavior, too brittle.
- Remove SYSTEM RELOAD MODELS manually from Keeper via command-line
manipulation of Keeper nodes or via SQL by dropping the privileges.
Needs user intervention during upgrade.