diff --git a/README.md b/README.md index 35580369fd0..951dbf67160 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,6 @@ ClickHouse® is an open-source column-oriented database management system that a * [Contacts](https://clickhouse.com/company/contact) can help to get your questions answered if there are any. ## Upcoming events -* [**v22.12 Release Webinar**](https://clickhouse.com/company/events/v22-12-release-webinar) 22.12 is the ClickHouse Christmas release. There are plenty of gifts (a new JOIN algorithm among them) and we adopted something from MongoDB. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release. +* **Recording available**: [**v22.12 Release Webinar**](https://www.youtube.com/watch?v=sREupr6uc2k) 22.12 is the ClickHouse Christmas release. There are plenty of gifts (a new JOIN algorithm among them) and we adopted something from MongoDB. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release. * [**ClickHouse Meetup at the CHEQ office in Tel Aviv**](https://www.meetup.com/clickhouse-tel-aviv-user-group/events/289599423/) - Jan 16 - We are very excited to be holding our next in-person ClickHouse meetup at the CHEQ office in Tel Aviv! Hear from CHEQ, ServiceNow and Contentsquare, as well as a deep dive presentation from ClickHouse CTO Alexey Milovidov. Join us for a fun evening of talks, food and discussion! * [**ClickHouse Meetup at Microsoft Office in Seattle**](https://www.meetup.com/clickhouse-seattle-user-group/events/290310025/) - Jan 18 - Keep an eye on this space as we will be announcing speakers soon! diff --git a/docs/en/development/build-cross-osx.md b/docs/en/development/build-cross-osx.md index 7b151d087df..1df88dbb235 100644 --- a/docs/en/development/build-cross-osx.md +++ b/docs/en/development/build-cross-osx.md @@ -1,15 +1,15 @@ --- slug: /en/development/build-cross-osx sidebar_position: 66 -title: How to Build ClickHouse on Linux for Mac OS X -sidebar_label: Build on Linux for Mac OS X +title: How to Build ClickHouse on Linux for macOS +sidebar_label: Build on Linux for macOS --- This is for the case when you have a Linux machine and want to use it to build `clickhouse` binary that will run on OS X. -This is intended for continuous integration checks that run on Linux servers. If you want to build ClickHouse directly on Mac OS X, then proceed with [another instruction](../development/build-osx.md). +This is intended for continuous integration checks that run on Linux servers. If you want to build ClickHouse directly on macOS, then proceed with [another instruction](../development/build-osx.md). -The cross-build for Mac OS X is based on the [Build instructions](../development/build.md), follow them first. +The cross-build for macOS is based on the [Build instructions](../development/build.md), follow them first. ## Install Clang-14 diff --git a/docs/en/development/build-osx.md b/docs/en/development/build-osx.md index 12f74feb272..656462eeb16 100644 --- a/docs/en/development/build-osx.md +++ b/docs/en/development/build-osx.md @@ -1,9 +1,9 @@ --- slug: /en/development/build-osx sidebar_position: 65 -sidebar_label: Build on Mac OS X -title: How to Build ClickHouse on Mac OS X -description: How to build ClickHouse on Mac OS X +sidebar_label: Build on macOS +title: How to Build ClickHouse on macOS +description: How to build ClickHouse on macOS --- :::info You don't have to build ClickHouse yourself! diff --git a/docs/en/development/developer-instruction.md b/docs/en/development/developer-instruction.md index 69afb31e214..526400e9cce 100644 --- a/docs/en/development/developer-instruction.md +++ b/docs/en/development/developer-instruction.md @@ -7,7 +7,7 @@ description: Prerequisites and an overview of how to build ClickHouse # Getting Started Guide for Building ClickHouse -The building of ClickHouse is supported on Linux, FreeBSD and Mac OS X. +The building of ClickHouse is supported on Linux, FreeBSD and macOS. If you use Windows, you need to create a virtual machine with Ubuntu. To start working with a virtual machine please install VirtualBox. You can download Ubuntu from the website: https://www.ubuntu.com/#download. Please create a virtual machine from the downloaded image (you should reserve at least 4GB of RAM for it). To run a command-line terminal in Ubuntu, please locate a program containing the word “terminal” in its name (gnome-terminal, konsole etc.) or just press Ctrl+Alt+T. @@ -194,7 +194,7 @@ In this case, ClickHouse will use config files located in the current directory. To connect to ClickHouse with clickhouse-client in another terminal navigate to `ClickHouse/build/programs/` and run `./clickhouse client`. -If you get `Connection refused` message on Mac OS X or FreeBSD, try specifying host address 127.0.0.1: +If you get `Connection refused` message on macOS or FreeBSD, try specifying host address 127.0.0.1: clickhouse client --host 127.0.0.1 @@ -213,7 +213,7 @@ You can also run your custom-built ClickHouse binary with the config file from t ## IDE (Integrated Development Environment) {#ide-integrated-development-environment} -If you do not know which IDE to use, we recommend that you use CLion. CLion is commercial software, but it offers 30 days free trial period. It is also free of charge for students. CLion can be used both on Linux and on Mac OS X. +If you do not know which IDE to use, we recommend that you use CLion. CLion is commercial software, but it offers 30 days free trial period. It is also free of charge for students. CLion can be used both on Linux and on macOS. KDevelop and QTCreator are other great alternatives of an IDE for developing ClickHouse. KDevelop comes in as a very handy IDE although unstable. If KDevelop crashes after a while upon opening project, you should click “Stop All” button as soon as it has opened the list of project’s files. After doing so KDevelop should be fine to work with. diff --git a/docs/en/development/tests.md b/docs/en/development/tests.md index e6d5cf66de9..729c3c9fb58 100644 --- a/docs/en/development/tests.md +++ b/docs/en/development/tests.md @@ -139,7 +139,7 @@ If the system clickhouse-server is already running and you do not want to stop i Build tests allow to check that build is not broken on various alternative configurations and on some foreign systems. These tests are automated as well. Examples: -- cross-compile for Darwin x86_64 (Mac OS X) +- cross-compile for Darwin x86_64 (macOS) - cross-compile for FreeBSD x86_64 - cross-compile for Linux AArch64 - build on Ubuntu with libraries from system packages (discouraged) diff --git a/docs/en/getting-started/install.md b/docs/en/getting-started/install.md index 53f885e3963..e7dada5cb9a 100644 --- a/docs/en/getting-started/install.md +++ b/docs/en/getting-started/install.md @@ -9,7 +9,7 @@ slug: /en/install You have three options for getting up and running with ClickHouse: - **[ClickHouse Cloud](https://clickhouse.com/cloud/):** The official ClickHouse as a service, - built by, maintained and supported by the creators of ClickHouse -- **[Self-managed ClickHouse](#self-managed-install):** ClickHouse can run on any Linux, FreeBSD, or Mac OS X with x86-64, ARM, or PowerPC64LE CPU architecture +- **[Self-managed ClickHouse](#self-managed-install):** ClickHouse can run on any Linux, FreeBSD, or macOS with x86-64, ARM, or PowerPC64LE CPU architecture - **[Docker Image](https://hub.docker.com/r/clickhouse/clickhouse-server/):** Read the guide with the official image in Docker Hub ## ClickHouse Cloud @@ -257,7 +257,7 @@ To run ClickHouse inside Docker follow the guide on [Docker Hub](https://hub.doc ### From Sources {#from-sources} -To manually compile ClickHouse, follow the instructions for [Linux](/docs/en/development/build.md) or [Mac OS X](/docs/en/development/build-osx.md). +To manually compile ClickHouse, follow the instructions for [Linux](/docs/en/development/build.md) or [macOS](/docs/en/development/build-osx.md). You can compile packages and install them or use programs without installing packages. @@ -352,7 +352,7 @@ To continue experimenting, you can download one of the test data sets or go thro ## Recommendations for Self-Managed ClickHouse -ClickHouse can run on any Linux, FreeBSD, or Mac OS X with x86-64, ARM, or PowerPC64LE CPU architecture. +ClickHouse can run on any Linux, FreeBSD, or macOS with x86-64, ARM, or PowerPC64LE CPU architecture. ClickHouse uses all hardware resources available to process data. diff --git a/docs/en/operations/server-configuration-parameters/settings.md b/docs/en/operations/server-configuration-parameters/settings.md index 5faf3819d7e..02f7b5008d5 100644 --- a/docs/en/operations/server-configuration-parameters/settings.md +++ b/docs/en/operations/server-configuration-parameters/settings.md @@ -890,7 +890,7 @@ The maximum number of open files. By default: `maximum`. -We recommend using this option in Mac OS X since the `getrlimit()` function returns an incorrect value. +We recommend using this option in macOS since the `getrlimit()` function returns an incorrect value. **Example** diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index 645a38a7f04..35d6f47852a 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -3447,13 +3447,45 @@ Default value: 2. ## compatibility {#compatibility} -This setting changes other settings according to provided ClickHouse version. -If a behaviour in ClickHouse was changed by using a different default value for some setting, this compatibility setting allows you to use default values from previous versions for all the settings that were not set by the user. +The `compatibility` setting causes ClickHouse to use the default settings of a previous version of ClickHouse, where the previous version is provided as the setting. -This setting takes ClickHouse version number as a string, like `21.3`, `21.8`. Empty value means that this setting is disabled. +If settings are set to non-default values, then those settings are honored (only settings that have not been modified are affected by the `compatibility` setting). + +This setting takes a ClickHouse version number as a string, like `22.3`, `22.8`. An empty value means that this setting is disabled. Disabled by default. +:::note +In ClickHouse Cloud the compatibility setting must be set by ClickHouse Cloud support. Please [open a case](https://clickhouse.cloud/support) to have it set. +::: + +## allow_settings_after_format_in_insert {#allow_settings_after_format_in_insert} + +Control whether `SETTINGS` after `FORMAT` in `INSERT` queries is allowed or not. It is not recommended to use this, since this may interpret part of `SETTINGS` as values. + +Example: + +```sql +INSERT INTO FUNCTION null('foo String') SETTINGS max_threads=1 VALUES ('bar'); +``` + +But the following query will work only with `allow_settings_after_format_in_insert`: + +```sql +SET allow_settings_after_format_in_insert=1; +INSERT INTO FUNCTION null('foo String') VALUES ('bar') SETTINGS max_threads=1; +``` + +Possible values: + +- 0 — Disallow. +- 1 — Allow. + +Default value: `0`. + +!!! note "Warning" + Use this setting only for backward compatibility if your use cases depend on old syntax. + # Format settings {#format-settings} ## input_format_skip_unknown_fields {#input_format_skip_unknown_fields} diff --git a/docs/en/sql-reference/functions/date-time-functions.md b/docs/en/sql-reference/functions/date-time-functions.md index 6156a823d58..be8e26daa87 100644 --- a/docs/en/sql-reference/functions/date-time-functions.md +++ b/docs/en/sql-reference/functions/date-time-functions.md @@ -1104,6 +1104,7 @@ Using replacement fields, you can define a pattern for the resulting string. “ | %d | day of the month, zero-padded (01-31) | 02 | | %D | Short MM/DD/YY date, equivalent to %m/%d/%y | 01/02/18 | | %e | day of the month, space-padded ( 1-31) |   2 | +| %f | fractional second from the fractional part of DateTime64 | 1234560 | | %F | short YYYY-MM-DD date, equivalent to %Y-%m-%d | 2018-01-02 | | %G | four-digit year format for ISO week number, calculated from the week-based year [defined by the ISO 8601](https://en.wikipedia.org/wiki/ISO_8601#Week_dates) standard, normally useful only with %V | 2018 | | %g | two-digit year format, aligned to ISO 8601, abbreviated from four-digit notation | 18 | @@ -1143,6 +1144,20 @@ Result: └────────────────────────────────────────────┘ ``` +Query: + +``` sql +SELECT formatDateTime(toDateTime64('2010-01-04 12:34:56.123456', 7), '%f') +``` + +Result: + +``` +┌─formatDateTime(toDateTime64('2010-01-04 12:34:56.123456', 7), '%f')─┐ +│ 1234560 │ +└─────────────────────────────────────────────────────────────────────┘ +``` + ## dateName Returns specified part of date. diff --git a/docs/en/sql-reference/functions/hash-functions.md b/docs/en/sql-reference/functions/hash-functions.md index cc66f62f714..936c20c6a77 100644 --- a/docs/en/sql-reference/functions/hash-functions.md +++ b/docs/en/sql-reference/functions/hash-functions.md @@ -595,9 +595,9 @@ SELECT xxHash64('') **Returned value** -A `Uint32` or `Uint64` data type hash value. +A `UInt32` or `UInt64` data type hash value. -Type: `xxHash`. +Type: `UInt32` for `xxHash32` and `UInt64` for `xxHash64`. **Example** diff --git a/docs/en/sql-reference/functions/random-functions.md b/docs/en/sql-reference/functions/random-functions.md index 4efa2131eb6..2c8166116e2 100644 --- a/docs/en/sql-reference/functions/random-functions.md +++ b/docs/en/sql-reference/functions/random-functions.md @@ -68,6 +68,440 @@ Result: └────────────┴────────────┴──────────────┴────────────────┴─────────────────┴──────────────────────┘ ``` +# Functions for Generating Random Numbers based on Distributions + +:::note +These functions are available starting from 22.10. +::: + + + +## randNormal + +Return random number based on [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution). + +**Syntax** + +``` sql +randNormal(meam, variance) +``` + +**Arguments** + +- `meam` - `Float64` mean value of distribution, +- `variance` - `Float64` - [variance](https://en.wikipedia.org/wiki/Variance). + +**Returned value** + +- Pseudo-random number. + +Type: [Float64](/docs/en/sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT randNormal(10, 2) FROM numbers(5) +``` + +Result: + +``` text +┌──randNormal(10, 2)─┐ +│ 13.389228911709653 │ +│ 8.622949707401295 │ +│ 10.801887062682981 │ +│ 4.5220192605895315 │ +│ 10.901239123982567 │ +└────────────────────┘ +``` + + + +## randLogNormal + +Return random number based on [log-normal distribution](https://en.wikipedia.org/wiki/Log-normal_distribution). + +**Syntax** + +``` sql +randLogNormal(meam, variance) +``` + +**Arguments** + +- `meam` - `Float64` mean value of distribution, +- `variance` - `Float64` - [variance](https://en.wikipedia.org/wiki/Variance). + +**Returned value** + +- Pseudo-random number. + +Type: [Float64](/docs/en/sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT randLogNormal(100, 5) FROM numbers(5) +``` + +Result: + +``` text +┌─randLogNormal(100, 5)─┐ +│ 1.295699673937363e48 │ +│ 9.719869109186684e39 │ +│ 6.110868203189557e42 │ +│ 9.912675872925529e39 │ +│ 2.3564708490552458e42 │ +└───────────────────────┘ +``` + + + +## randBinomial + +Return random number based on [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution). + +**Syntax** + +``` sql +randBinomial(experiments, probability) +``` + +**Arguments** + +- `experiments` - `UInt64` number of experiments, +- `probability` - `Float64` - probability of success in each experiment (values in `0...1` range only). + +**Returned value** + +- Pseudo-random number. + +Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +SELECT randBinomial(100, .75) FROM numbers(5) +``` + +Result: + +``` text +┌─randBinomial(100, 0.75)─┐ +│ 74 │ +│ 78 │ +│ 76 │ +│ 77 │ +│ 80 │ +└─────────────────────────┘ +``` + + + +## randNegativeBinomial + +Return random number based on [negative binomial distribution](https://en.wikipedia.org/wiki/Negative_binomial_distribution). + +**Syntax** + +``` sql +randNegativeBinomial(experiments, probability) +``` + +**Arguments** + +- `experiments` - `UInt64` number of experiments, +- `probability` - `Float64` - probability of failure in each experiment (values in `0...1` range only). + +**Returned value** + +- Pseudo-random number. + +Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +SELECT randNegativeBinomial(100, .75) FROM numbers(5) +``` + +Result: + +``` text +┌─randNegativeBinomial(100, 0.75)─┐ +│ 33 │ +│ 32 │ +│ 39 │ +│ 40 │ +│ 50 │ +└─────────────────────────────────┘ +``` + + + +## randPoisson + +Return random number based on [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution). + +**Syntax** + +``` sql +randPoisson(n) +``` + +**Arguments** + +- `n` - `UInt64` mean number of occurrences. + +**Returned value** + +- Pseudo-random number. + +Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +SELECT randPoisson(10) FROM numbers(5) +``` + +Result: + +``` text +┌─randPoisson(10)─┐ +│ 8 │ +│ 8 │ +│ 7 │ +│ 10 │ +│ 6 │ +└─────────────────┘ +``` + + + +## randBernoulli + +Return random number based on [Bernoulli distribution](https://en.wikipedia.org/wiki/Bernoulli_distribution). + +**Syntax** + +``` sql +randBernoulli(probability) +``` + +**Arguments** + +- `probability` - `Float64` - probability of success (values in `0...1` range only). + +**Returned value** + +- Pseudo-random number. + +Type: [UInt64](/docs/en/sql-reference/data-types/int-uint.md). + +**Example** + +Query: + +``` sql +SELECT randBernoulli(.75) FROM numbers(5) +``` + +Result: + +``` text +┌─randBernoulli(0.75)─┐ +│ 1 │ +│ 1 │ +│ 0 │ +│ 1 │ +│ 1 │ +└─────────────────────┘ +``` + + + +## randExponential + +Return random number based on [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution). + +**Syntax** + +``` sql +randExponential(lambda) +``` + +**Arguments** + +- `lambda` - `Float64` lambda value. + +**Returned value** + +- Pseudo-random number. + +Type: [Float64](/docs/en/sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT randExponential(1/10) FROM numbers(5) +``` + +Result: + +``` text +┌─randExponential(divide(1, 10))─┐ +│ 44.71628934340778 │ +│ 4.211013337903262 │ +│ 10.809402553207766 │ +│ 15.63959406553284 │ +│ 1.8148392319860158 │ +└────────────────────────────────┘ +``` + + + +## randChiSquared + +Return random number based on [Chi-square distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution) - a distribution of a sum of the squares of k independent standard normal random variables. + +**Syntax** + +``` sql +randChiSquared(degree_of_freedom) +``` + +**Arguments** + +- `degree_of_freedom` - `Float64` degree of freedom. + +**Returned value** + +- Pseudo-random number. + +Type: [Float64](/docs/en/sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT randChiSquared(10) FROM numbers(5) +``` + +Result: + +``` text +┌─randChiSquared(10)─┐ +│ 10.015463656521543 │ +│ 9.621799919882768 │ +│ 2.71785015634699 │ +│ 11.128188665931908 │ +│ 4.902063104425469 │ +└────────────────────┘ +``` + + + +## randStudentT + +Return random number based on [Student's t-distribution](https://en.wikipedia.org/wiki/Student%27s_t-distribution). + +**Syntax** + +``` sql +randStudentT(degree_of_freedom) +``` + +**Arguments** + +- `degree_of_freedom` - `Float64` degree of freedom. + +**Returned value** + +- Pseudo-random number. + +Type: [Float64](/docs/en/sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT randStudentT(10) FROM numbers(5) +``` + +Result: + +``` text +┌─────randStudentT(10)─┐ +│ 1.2217309938538725 │ +│ 1.7941971681200541 │ +│ -0.28192176076784664 │ +│ 0.2508897721303792 │ +│ -2.7858432909761186 │ +└──────────────────────┘ +``` + + + +## randFisherF + +Return random number based on [F-distribution](https://en.wikipedia.org/wiki/F-distribution). + +**Syntax** + +``` sql +randFisherF(d1, d2) +``` + +**Arguments** + +- `d1` - `Float64` d1 degree of freedom in `X = (S1 / d1) / (S2 / d2)`, +- `d2` - `Float64` d2 degree of freedom in `X = (S1 / d1) / (S2 / d2)`, + +**Returned value** + +- Pseudo-random number. + +Type: [Float64](/docs/en/sql-reference/data-types/float.md). + +**Example** + +Query: + +``` sql +SELECT randFisherF(10, 3) FROM numbers(5) +``` + +Result: + +``` text +┌──randFisherF(10, 3)─┐ +│ 7.286287504216609 │ +│ 0.26590779413050386 │ +│ 0.22207610901168987 │ +│ 0.7953362728449572 │ +│ 0.19278885985221572 │ +└─────────────────────┘ +``` + + + + # Random Functions for Working with Strings ## randomString diff --git a/docs/en/sql-reference/statements/select/from.md b/docs/en/sql-reference/statements/select/from.md index 3013a173c16..b751384cb72 100644 --- a/docs/en/sql-reference/statements/select/from.md +++ b/docs/en/sql-reference/statements/select/from.md @@ -21,12 +21,11 @@ Subquery is another `SELECT` query that may be specified in parenthesis inside ` When `FINAL` is specified, ClickHouse fully merges the data before returning the result and thus performs all data transformations that happen during merges for the given table engine. -It is applicable when selecting data from tables that use the [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md)-engine family. Also supported for: +It is applicable when selecting data from ReplacingMergeTree, SummingMergeTree, AggregatingMergeTree, CollapsingMergeTree and VersionedCollapsingMergeTree tables. -- [Replicated](../../../engines/table-engines/mergetree-family/replication.md) versions of `MergeTree` engines. -- [View](../../../engines/table-engines/special/view.md), [Buffer](../../../engines/table-engines/special/buffer.md), [Distributed](../../../engines/table-engines/special/distributed.md), and [MaterializedView](../../../engines/table-engines/special/materializedview.md) engines that operate over other engines, provided they were created over `MergeTree`-engine tables. +`SELECT` queries with `FINAL` are executed in parallel. The [max_final_threads](../../../operations/settings/settings.md#max-final-threads) setting limits the number of threads used. -Now `SELECT` queries with `FINAL` are executed in parallel and slightly faster. But there are drawbacks (see below). The [max_final_threads](../../../operations/settings/settings.md#max-final-threads) setting limits the number of threads used. +There are drawbacks to using `FINAL` (see below). ### Drawbacks diff --git a/src/AggregateFunctions/AggregateFunctionAggThrow.cpp b/src/AggregateFunctions/AggregateFunctionAggThrow.cpp index 432b1f39f84..359c6051abb 100644 --- a/src/AggregateFunctions/AggregateFunctionAggThrow.cpp +++ b/src/AggregateFunctions/AggregateFunctionAggThrow.cpp @@ -49,14 +49,16 @@ private: public: AggregateFunctionThrow(const DataTypes & argument_types_, const Array & parameters_, Float64 throw_probability_) - : IAggregateFunctionDataHelper(argument_types_, parameters_), throw_probability(throw_probability_) {} + : IAggregateFunctionDataHelper(argument_types_, parameters_, createResultType()) + , throw_probability(throw_probability_) + {} String getName() const override { return "aggThrow"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared(); } diff --git a/src/AggregateFunctions/AggregateFunctionAnalysisOfVariance.h b/src/AggregateFunctions/AggregateFunctionAnalysisOfVariance.h index e891fb191f6..da060ceb18e 100644 --- a/src/AggregateFunctions/AggregateFunctionAnalysisOfVariance.h +++ b/src/AggregateFunctions/AggregateFunctionAnalysisOfVariance.h @@ -37,10 +37,10 @@ class AggregateFunctionAnalysisOfVariance final : public IAggregateFunctionDataH { public: explicit AggregateFunctionAnalysisOfVariance(const DataTypes & arguments, const Array & params) - : IAggregateFunctionDataHelper(arguments, params) + : IAggregateFunctionDataHelper(arguments, params, createResultType()) {} - DataTypePtr getReturnType() const override + DataTypePtr createResultType() const { DataTypes types {std::make_shared>(), std::make_shared>() }; Strings names {"f_statistic", "p_value"}; diff --git a/src/AggregateFunctions/AggregateFunctionArgMinMax.h b/src/AggregateFunctions/AggregateFunctionArgMinMax.h index decb572b019..568b70fe77e 100644 --- a/src/AggregateFunctions/AggregateFunctionArgMinMax.h +++ b/src/AggregateFunctions/AggregateFunctionArgMinMax.h @@ -38,7 +38,6 @@ template class AggregateFunctionArgMinMax final : public IAggregateFunctionDataHelper> { private: - const DataTypePtr & type_res; const DataTypePtr & type_val; const SerializationPtr serialization_res; const SerializationPtr serialization_val; @@ -47,10 +46,9 @@ private: public: AggregateFunctionArgMinMax(const DataTypePtr & type_res_, const DataTypePtr & type_val_) - : Base({type_res_, type_val_}, {}) - , type_res(this->argument_types[0]) + : Base({type_res_, type_val_}, {}, type_res_) , type_val(this->argument_types[1]) - , serialization_res(type_res->getDefaultSerialization()) + , serialization_res(type_res_->getDefaultSerialization()) , serialization_val(type_val->getDefaultSerialization()) { if (!type_val->isComparable()) @@ -63,11 +61,6 @@ public: return StringRef(Data::ValueData_t::name()) == StringRef("min") ? "argMin" : "argMax"; } - DataTypePtr getReturnType() const override - { - return type_res; - } - void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override { if (this->data(place).value.changeIfBetter(*columns[1], row_num, arena)) diff --git a/src/AggregateFunctions/AggregateFunctionArray.h b/src/AggregateFunctions/AggregateFunctionArray.h index c6e29e77318..c0e676c33e7 100644 --- a/src/AggregateFunctions/AggregateFunctionArray.h +++ b/src/AggregateFunctions/AggregateFunctionArray.h @@ -30,7 +30,7 @@ private: public: AggregateFunctionArray(AggregateFunctionPtr nested_, const DataTypes & arguments, const Array & params_) - : IAggregateFunctionHelper(arguments, params_) + : IAggregateFunctionHelper(arguments, params_, createResultType(nested_)) , nested_func(nested_), num_arguments(arguments.size()) { assert(parameters == nested_func->getParameters()); @@ -44,9 +44,9 @@ public: return nested_func->getName() + "Array"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const AggregateFunctionPtr & nested_) { - return nested_func->getReturnType(); + return nested_->getResultType(); } const IAggregateFunction & getBaseAggregateFunctionWithSameStateRepresentation() const override diff --git a/src/AggregateFunctions/AggregateFunctionAvg.h b/src/AggregateFunctions/AggregateFunctionAvg.h index ee46a40023d..a86c7d042fc 100644 --- a/src/AggregateFunctions/AggregateFunctionAvg.h +++ b/src/AggregateFunctions/AggregateFunctionAvg.h @@ -10,6 +10,7 @@ #include #include #include +#include #include "config.h" @@ -83,10 +84,20 @@ public: using Fraction = AvgFraction; explicit AggregateFunctionAvgBase(const DataTypes & argument_types_, - UInt32 num_scale_ = 0, UInt32 denom_scale_ = 0) - : Base(argument_types_, {}), num_scale(num_scale_), denom_scale(denom_scale_) {} + UInt32 num_scale_ = 0, UInt32 denom_scale_ = 0) + : Base(argument_types_, {}, createResultType()) + , num_scale(num_scale_) + , denom_scale(denom_scale_) + {} - DataTypePtr getReturnType() const override { return std::make_shared>(); } + AggregateFunctionAvgBase(const DataTypes & argument_types_, const DataTypePtr & result_type_, + UInt32 num_scale_ = 0, UInt32 denom_scale_ = 0) + : Base(argument_types_, {}, result_type_) + , num_scale(num_scale_) + , denom_scale(denom_scale_) + {} + + DataTypePtr createResultType() const { return std::make_shared>(); } bool allocatesMemoryInArena() const override { return false; } @@ -135,7 +146,7 @@ public: for (const auto & argument : this->argument_types) can_be_compiled &= canBeNativeType(*argument); - auto return_type = getReturnType(); + auto return_type = this->getResultType(); can_be_compiled &= canBeNativeType(*return_type); return can_be_compiled; diff --git a/src/AggregateFunctions/AggregateFunctionBitwise.h b/src/AggregateFunctions/AggregateFunctionBitwise.h index b8d3bc79007..6c94a72bf32 100644 --- a/src/AggregateFunctions/AggregateFunctionBitwise.h +++ b/src/AggregateFunctions/AggregateFunctionBitwise.h @@ -97,11 +97,12 @@ class AggregateFunctionBitwise final : public IAggregateFunctionDataHelper>({type}, {}) {} + : IAggregateFunctionDataHelper>({type}, {}, createResultType()) + {} String getName() const override { return Data::name(); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared>(); } @@ -137,7 +138,7 @@ public: bool isCompilable() const override { - auto return_type = getReturnType(); + auto return_type = this->getResultType(); return canBeNativeType(*return_type); } @@ -151,7 +152,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * value_ptr = aggregate_data_ptr; auto * value = b.CreateLoad(return_type, value_ptr); @@ -166,7 +167,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * value_dst_ptr = aggregate_data_dst_ptr; auto * value_dst = b.CreateLoad(return_type, value_dst_ptr); @@ -183,7 +184,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * value_ptr = aggregate_data_ptr; return b.CreateLoad(return_type, value_ptr); diff --git a/src/AggregateFunctions/AggregateFunctionBoundingRatio.h b/src/AggregateFunctions/AggregateFunctionBoundingRatio.h index 34e3fa2f747..8fca88889b8 100644 --- a/src/AggregateFunctions/AggregateFunctionBoundingRatio.h +++ b/src/AggregateFunctions/AggregateFunctionBoundingRatio.h @@ -112,7 +112,7 @@ public: } explicit AggregateFunctionBoundingRatio(const DataTypes & arguments) - : IAggregateFunctionDataHelper(arguments, {}) + : IAggregateFunctionDataHelper(arguments, {}, std::make_shared()) { const auto * x_arg = arguments.at(0).get(); const auto * y_arg = arguments.at(1).get(); @@ -122,11 +122,6 @@ public: ErrorCodes::BAD_ARGUMENTS); } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, const size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionCategoricalInformationValue.cpp b/src/AggregateFunctions/AggregateFunctionCategoricalInformationValue.cpp index 93b5de0c5ab..65dce832789 100644 --- a/src/AggregateFunctions/AggregateFunctionCategoricalInformationValue.cpp +++ b/src/AggregateFunctions/AggregateFunctionCategoricalInformationValue.cpp @@ -46,9 +46,9 @@ private: } public: - AggregateFunctionCategoricalIV(const DataTypes & arguments_, const Array & params_) : - IAggregateFunctionHelper{arguments_, params_}, - category_count{arguments_.size() - 1} + AggregateFunctionCategoricalIV(const DataTypes & arguments_, const Array & params_) + : IAggregateFunctionHelper{arguments_, params_, createResultType()} + , category_count{arguments_.size() - 1} { // notice: argument types has been checked before } @@ -121,7 +121,7 @@ public: buf.readStrict(place, sizeOfData()); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared( std::make_shared>()); diff --git a/src/AggregateFunctions/AggregateFunctionCount.h b/src/AggregateFunctions/AggregateFunctionCount.h index 6e2c86f065b..91409463409 100644 --- a/src/AggregateFunctions/AggregateFunctionCount.h +++ b/src/AggregateFunctions/AggregateFunctionCount.h @@ -39,11 +39,13 @@ namespace ErrorCodes class AggregateFunctionCount final : public IAggregateFunctionDataHelper { public: - explicit AggregateFunctionCount(const DataTypes & argument_types_) : IAggregateFunctionDataHelper(argument_types_, {}) {} + explicit AggregateFunctionCount(const DataTypes & argument_types_) + : IAggregateFunctionDataHelper(argument_types_, {}, createResultType()) + {} String getName() const override { return "count"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared(); } @@ -167,7 +169,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * count_value_ptr = aggregate_data_ptr; auto * count_value = b.CreateLoad(return_type, count_value_ptr); @@ -180,7 +182,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * count_value_dst_ptr = aggregate_data_dst_ptr; auto * count_value_dst = b.CreateLoad(return_type, count_value_dst_ptr); @@ -197,7 +199,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * count_value_ptr = aggregate_data_ptr; return b.CreateLoad(return_type, count_value_ptr); @@ -214,7 +216,7 @@ class AggregateFunctionCountNotNullUnary final { public: AggregateFunctionCountNotNullUnary(const DataTypePtr & argument, const Array & params) - : IAggregateFunctionDataHelper({argument}, params) + : IAggregateFunctionDataHelper({argument}, params, createResultType()) { if (!argument->isNullable()) throw Exception("Logical error: not Nullable data type passed to AggregateFunctionCountNotNullUnary", ErrorCodes::LOGICAL_ERROR); @@ -222,7 +224,7 @@ public: String getName() const override { return "count"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared(); } @@ -311,7 +313,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * is_null_value = b.CreateExtractValue(values[0], {1}); auto * increment_value = b.CreateSelect(is_null_value, llvm::ConstantInt::get(return_type, 0), llvm::ConstantInt::get(return_type, 1)); @@ -327,7 +329,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * count_value_dst_ptr = aggregate_data_dst_ptr; auto * count_value_dst = b.CreateLoad(return_type, count_value_dst_ptr); @@ -344,7 +346,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * count_value_ptr = aggregate_data_ptr; return b.CreateLoad(return_type, count_value_ptr); diff --git a/src/AggregateFunctions/AggregateFunctionDeltaSum.h b/src/AggregateFunctions/AggregateFunctionDeltaSum.h index 36d0ef55346..199d2706d3a 100644 --- a/src/AggregateFunctions/AggregateFunctionDeltaSum.h +++ b/src/AggregateFunctions/AggregateFunctionDeltaSum.h @@ -31,7 +31,7 @@ class AggregationFunctionDeltaSum final { public: AggregationFunctionDeltaSum(const DataTypes & arguments, const Array & params) - : IAggregateFunctionDataHelper, AggregationFunctionDeltaSum>{arguments, params} + : IAggregateFunctionDataHelper, AggregationFunctionDeltaSum>{arguments, params, createResultType()} {} AggregationFunctionDeltaSum() @@ -40,7 +40,7 @@ public: String getName() const override { return "deltaSum"; } - DataTypePtr getReturnType() const override { return std::make_shared>(); } + static DataTypePtr createResultType() { return std::make_shared>(); } bool allocatesMemoryInArena() const override { return false; } diff --git a/src/AggregateFunctions/AggregateFunctionDeltaSumTimestamp.h b/src/AggregateFunctions/AggregateFunctionDeltaSumTimestamp.h index a311910de7f..5ca07bb0bdf 100644 --- a/src/AggregateFunctions/AggregateFunctionDeltaSumTimestamp.h +++ b/src/AggregateFunctions/AggregateFunctionDeltaSumTimestamp.h @@ -38,7 +38,7 @@ public: : IAggregateFunctionDataHelper< AggregationFunctionDeltaSumTimestampData, AggregationFunctionDeltaSumTimestamp - >{arguments, params} + >{arguments, params, createResultType()} {} AggregationFunctionDeltaSumTimestamp() @@ -52,7 +52,7 @@ public: String getName() const override { return "deltaSumTimestamp"; } - DataTypePtr getReturnType() const override { return std::make_shared>(); } + static DataTypePtr createResultType() { return std::make_shared>(); } void NO_SANITIZE_UNDEFINED ALWAYS_INLINE add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override { diff --git a/src/AggregateFunctions/AggregateFunctionDistinct.h b/src/AggregateFunctions/AggregateFunctionDistinct.h index 2d7362ba4cc..e09e0ef621d 100644 --- a/src/AggregateFunctions/AggregateFunctionDistinct.h +++ b/src/AggregateFunctions/AggregateFunctionDistinct.h @@ -168,7 +168,7 @@ private: public: AggregateFunctionDistinct(AggregateFunctionPtr nested_func_, const DataTypes & arguments, const Array & params_) - : IAggregateFunctionDataHelper(arguments, params_) + : IAggregateFunctionDataHelper(arguments, params_, nested_func_->getResultType()) , nested_func(nested_func_) , arguments_num(arguments.size()) { @@ -255,11 +255,6 @@ public: return nested_func->getName() + "Distinct"; } - DataTypePtr getReturnType() const override - { - return nested_func->getReturnType(); - } - bool allocatesMemoryInArena() const override { return true; diff --git a/src/AggregateFunctions/AggregateFunctionEntropy.h b/src/AggregateFunctions/AggregateFunctionEntropy.h index a51dd0537bf..9321b5c5825 100644 --- a/src/AggregateFunctions/AggregateFunctionEntropy.h +++ b/src/AggregateFunctions/AggregateFunctionEntropy.h @@ -92,14 +92,14 @@ private: public: explicit AggregateFunctionEntropy(const DataTypes & argument_types_) - : IAggregateFunctionDataHelper, AggregateFunctionEntropy>(argument_types_, {}) + : IAggregateFunctionDataHelper, AggregateFunctionEntropy>(argument_types_, {}, createResultType()) , num_args(argument_types_.size()) { } String getName() const override { return "entropy"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared>(); } diff --git a/src/AggregateFunctions/AggregateFunctionExponentialMovingAverage.cpp b/src/AggregateFunctions/AggregateFunctionExponentialMovingAverage.cpp index 2c055c37cca..bb48b3416be 100644 --- a/src/AggregateFunctions/AggregateFunctionExponentialMovingAverage.cpp +++ b/src/AggregateFunctions/AggregateFunctionExponentialMovingAverage.cpp @@ -29,7 +29,7 @@ private: public: AggregateFunctionExponentialMovingAverage(const DataTypes & argument_types_, const Array & params) - : IAggregateFunctionDataHelper(argument_types_, params) + : IAggregateFunctionDataHelper(argument_types_, params, createResultType()) { if (params.size() != 1) throw Exception{"Aggregate function " + getName() + " requires exactly one parameter: half decay time.", @@ -43,7 +43,7 @@ public: return "exponentialMovingAverage"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared>(); } diff --git a/src/AggregateFunctions/AggregateFunctionFlameGraph.cpp b/src/AggregateFunctions/AggregateFunctionFlameGraph.cpp index 5fc6b21926e..e25dfead466 100644 --- a/src/AggregateFunctions/AggregateFunctionFlameGraph.cpp +++ b/src/AggregateFunctions/AggregateFunctionFlameGraph.cpp @@ -523,12 +523,12 @@ class AggregateFunctionFlameGraph final : public IAggregateFunctionDataHelper(argument_types_, {}) + : IAggregateFunctionDataHelper(argument_types_, {}, createResultType()) {} String getName() const override { return "flameGraph"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared(std::make_shared()); } diff --git a/src/AggregateFunctions/AggregateFunctionForEach.h b/src/AggregateFunctions/AggregateFunctionForEach.h index c91c4dd7c86..69102424bf7 100644 --- a/src/AggregateFunctions/AggregateFunctionForEach.h +++ b/src/AggregateFunctions/AggregateFunctionForEach.h @@ -107,7 +107,7 @@ private: public: AggregateFunctionForEach(AggregateFunctionPtr nested_, const DataTypes & arguments, const Array & params_) - : IAggregateFunctionDataHelper(arguments, params_) + : IAggregateFunctionDataHelper(arguments, params_, createResultType(nested_)) , nested_func(nested_), num_arguments(arguments.size()) { nested_size_of_data = nested_func->sizeOfData(); @@ -125,9 +125,9 @@ public: return nested_func->getName() + "ForEach"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(AggregateFunctionPtr nested_) { - return std::make_shared(nested_func->getReturnType()); + return std::make_shared(nested_->getResultType()); } bool isVersioned() const override diff --git a/src/AggregateFunctions/AggregateFunctionGroupArray.h b/src/AggregateFunctions/AggregateFunctionGroupArray.h index 89b382de819..f902cabb99a 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupArray.h +++ b/src/AggregateFunctions/AggregateFunctionGroupArray.h @@ -121,7 +121,7 @@ public: explicit GroupArrayNumericImpl( const DataTypePtr & data_type_, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max(), UInt64 seed_ = 123456) : IAggregateFunctionDataHelper, GroupArrayNumericImpl>( - {data_type_}, parameters_) + {data_type_}, parameters_, std::make_shared(data_type_)) , max_elems(max_elems_) , seed(seed_) { @@ -129,8 +129,6 @@ public: String getName() const override { return getNameByTrait(); } - DataTypePtr getReturnType() const override { return std::make_shared(this->argument_types[0]); } - void insert(Data & a, const T & v, Arena * arena) const { ++a.total_values; @@ -423,7 +421,7 @@ class GroupArrayGeneralImpl final public: GroupArrayGeneralImpl(const DataTypePtr & data_type_, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max(), UInt64 seed_ = 123456) : IAggregateFunctionDataHelper, GroupArrayGeneralImpl>( - {data_type_}, parameters_) + {data_type_}, parameters_, std::make_shared(data_type_)) , data_type(this->argument_types[0]) , max_elems(max_elems_) , seed(seed_) @@ -432,8 +430,6 @@ public: String getName() const override { return getNameByTrait(); } - DataTypePtr getReturnType() const override { return std::make_shared(data_type); } - void insert(Data & a, const Node * v, Arena * arena) const { ++a.total_values; @@ -697,7 +693,7 @@ class GroupArrayGeneralListImpl final public: GroupArrayGeneralListImpl(const DataTypePtr & data_type_, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) - : IAggregateFunctionDataHelper, GroupArrayGeneralListImpl>({data_type_}, parameters_) + : IAggregateFunctionDataHelper, GroupArrayGeneralListImpl>({data_type_}, parameters_, std::make_shared(data_type_)) , data_type(this->argument_types[0]) , max_elems(max_elems_) { @@ -705,8 +701,6 @@ public: String getName() const override { return getNameByTrait(); } - DataTypePtr getReturnType() const override { return std::make_shared(data_type); } - void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override { if (limit_num_elems && data(place).elems >= max_elems) diff --git a/src/AggregateFunctions/AggregateFunctionGroupArrayInsertAt.h b/src/AggregateFunctions/AggregateFunctionGroupArrayInsertAt.h index a1a2ce2669b..42fe4083de1 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupArrayInsertAt.h +++ b/src/AggregateFunctions/AggregateFunctionGroupArrayInsertAt.h @@ -64,7 +64,7 @@ private: public: AggregateFunctionGroupArrayInsertAtGeneric(const DataTypes & arguments, const Array & params) - : IAggregateFunctionDataHelper(arguments, params) + : IAggregateFunctionDataHelper(arguments, params, std::make_shared(arguments[0])) , type(argument_types[0]) , serialization(type->getDefaultSerialization()) { @@ -101,11 +101,6 @@ public: String getName() const override { return "groupArrayInsertAt"; } - DataTypePtr getReturnType() const override - { - return std::make_shared(type); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionGroupArrayMoving.h b/src/AggregateFunctions/AggregateFunctionGroupArrayMoving.h index 40867b1949a..4444de793b4 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupArrayMoving.h +++ b/src/AggregateFunctions/AggregateFunctionGroupArrayMoving.h @@ -93,12 +93,15 @@ public: using ColumnResult = ColumnVectorOrDecimal; explicit MovingImpl(const DataTypePtr & data_type_, UInt64 window_size_ = std::numeric_limits::max()) - : IAggregateFunctionDataHelper>({data_type_}, {}) + : IAggregateFunctionDataHelper>({data_type_}, {}, createResultType(data_type_)) , window_size(window_size_) {} String getName() const override { return Data::name; } - DataTypePtr getReturnType() const override { return std::make_shared(getReturnTypeElement()); } + static DataTypePtr createResultType(const DataTypePtr & argument) + { + return std::make_shared(getReturnTypeElement(argument)); + } void NO_SANITIZE_UNDEFINED add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override { @@ -183,14 +186,14 @@ public: } private: - auto getReturnTypeElement() const + static auto getReturnTypeElement(const DataTypePtr & argument) { if constexpr (!is_decimal) return std::make_shared>(); else { using Res = DataTypeDecimal; - return std::make_shared(Res::maxPrecision(), getDecimalScale(*this->argument_types.at(0))); + return std::make_shared(Res::maxPrecision(), getDecimalScale(*argument)); } } }; diff --git a/src/AggregateFunctions/AggregateFunctionGroupBitmap.h b/src/AggregateFunctions/AggregateFunctionGroupBitmap.h index dacde67f3ca..5fe3128fa20 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupBitmap.h +++ b/src/AggregateFunctions/AggregateFunctionGroupBitmap.h @@ -19,13 +19,13 @@ class AggregateFunctionBitmap final : public IAggregateFunctionDataHelper>({type}, {}) + : IAggregateFunctionDataHelper>({type}, {}, createResultType()) { } String getName() const override { return Data::name(); } - DataTypePtr getReturnType() const override { return std::make_shared>(); } + static DataTypePtr createResultType() { return std::make_shared>(); } bool allocatesMemoryInArena() const override { return false; } @@ -59,13 +59,13 @@ private: static constexpr size_t STATE_VERSION_1_MIN_REVISION = 54455; public: explicit AggregateFunctionBitmapL2(const DataTypePtr & type) - : IAggregateFunctionDataHelper>({type}, {}) + : IAggregateFunctionDataHelper>({type}, {}, createResultType()) { } String getName() const override { return Policy::name; } - DataTypePtr getReturnType() const override { return std::make_shared>(); } + static DataTypePtr createResultType() { return std::make_shared>(); } bool allocatesMemoryInArena() const override { return false; } diff --git a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp index da934531f96..4589f68280f 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp +++ b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.cpp @@ -26,8 +26,8 @@ class AggregateFunctionGroupUniqArrayDate : public AggregateFunctionGroupUniqArr { public: explicit AggregateFunctionGroupUniqArrayDate(const DataTypePtr & argument_type, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) - : AggregateFunctionGroupUniqArray(argument_type, parameters_, max_elems_) {} - DataTypePtr getReturnType() const override { return std::make_shared(std::make_shared()); } + : AggregateFunctionGroupUniqArray(argument_type, parameters_, createResultType(), max_elems_) {} + static DataTypePtr createResultType() { return std::make_shared(std::make_shared()); } }; template @@ -35,8 +35,8 @@ class AggregateFunctionGroupUniqArrayDateTime : public AggregateFunctionGroupUni { public: explicit AggregateFunctionGroupUniqArrayDateTime(const DataTypePtr & argument_type, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) - : AggregateFunctionGroupUniqArray(argument_type, parameters_, max_elems_) {} - DataTypePtr getReturnType() const override { return std::make_shared(std::make_shared()); } + : AggregateFunctionGroupUniqArray(argument_type, parameters_, createResultType(), max_elems_) {} + static DataTypePtr createResultType() { return std::make_shared(std::make_shared()); } }; template diff --git a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h index 93db1644bd4..f8e426363d8 100644 --- a/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h +++ b/src/AggregateFunctions/AggregateFunctionGroupUniqArray.h @@ -50,15 +50,16 @@ private: public: AggregateFunctionGroupUniqArray(const DataTypePtr & argument_type, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) : IAggregateFunctionDataHelper, - AggregateFunctionGroupUniqArray>({argument_type}, parameters_), + AggregateFunctionGroupUniqArray>({argument_type}, parameters_, std::make_shared(argument_type)), max_elems(max_elems_) {} - String getName() const override { return "groupUniqArray"; } + AggregateFunctionGroupUniqArray(const DataTypePtr & argument_type, const Array & parameters_, const DataTypePtr & result_type_, UInt64 max_elems_ = std::numeric_limits::max()) + : IAggregateFunctionDataHelper, + AggregateFunctionGroupUniqArray>({argument_type}, parameters_, result_type_), + max_elems(max_elems_) {} - DataTypePtr getReturnType() const override - { - return std::make_shared(this->argument_types[0]); - } + + String getName() const override { return "groupUniqArray"; } bool allocatesMemoryInArena() const override { return false; } @@ -153,17 +154,12 @@ class AggregateFunctionGroupUniqArrayGeneric public: AggregateFunctionGroupUniqArrayGeneric(const DataTypePtr & input_data_type_, const Array & parameters_, UInt64 max_elems_ = std::numeric_limits::max()) - : IAggregateFunctionDataHelper>({input_data_type_}, parameters_) + : IAggregateFunctionDataHelper>({input_data_type_}, parameters_, std::make_shared(input_data_type_)) , input_data_type(this->argument_types[0]) , max_elems(max_elems_) {} String getName() const override { return "groupUniqArray"; } - DataTypePtr getReturnType() const override - { - return std::make_shared(input_data_type); - } - bool allocatesMemoryInArena() const override { return true; diff --git a/src/AggregateFunctions/AggregateFunctionHistogram.h b/src/AggregateFunctions/AggregateFunctionHistogram.h index fbd92aa8220..c559b3f115f 100644 --- a/src/AggregateFunctions/AggregateFunctionHistogram.h +++ b/src/AggregateFunctions/AggregateFunctionHistogram.h @@ -307,7 +307,7 @@ private: public: AggregateFunctionHistogram(UInt32 max_bins_, const DataTypes & arguments, const Array & params) - : IAggregateFunctionDataHelper>(arguments, params) + : IAggregateFunctionDataHelper>(arguments, params, createResultType()) , max_bins(max_bins_) { } @@ -316,7 +316,7 @@ public: { return Data::structSize(max_bins); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { DataTypes types; auto mean = std::make_shared>(); diff --git a/src/AggregateFunctions/AggregateFunctionIf.cpp b/src/AggregateFunctions/AggregateFunctionIf.cpp index c32454b10e4..06a07e23b93 100644 --- a/src/AggregateFunctions/AggregateFunctionIf.cpp +++ b/src/AggregateFunctions/AggregateFunctionIf.cpp @@ -448,7 +448,7 @@ AggregateFunctionPtr AggregateFunctionIf::getOwnNullAdapter( /// Nullability of the last argument (condition) does not affect the nullability of the result (NULL is processed as false). /// For other arguments it is as usual (at least one is NULL then the result is NULL if possible). - bool return_type_is_nullable = !properties.returns_default_when_only_null && getReturnType()->canBeInsideNullable() + bool return_type_is_nullable = !properties.returns_default_when_only_null && getResultType()->canBeInsideNullable() && std::any_of(arguments.begin(), arguments.end() - 1, [](const auto & element) { return element->isNullable(); }); bool need_to_serialize_flag = return_type_is_nullable || properties.returns_default_when_only_null; diff --git a/src/AggregateFunctions/AggregateFunctionIf.h b/src/AggregateFunctions/AggregateFunctionIf.h index ccc4809dd06..585dcf038dc 100644 --- a/src/AggregateFunctions/AggregateFunctionIf.h +++ b/src/AggregateFunctions/AggregateFunctionIf.h @@ -36,7 +36,7 @@ private: public: AggregateFunctionIf(AggregateFunctionPtr nested, const DataTypes & types, const Array & params_) - : IAggregateFunctionHelper(types, params_) + : IAggregateFunctionHelper(types, params_, nested->getResultType()) , nested_func(nested), num_arguments(types.size()) { if (num_arguments == 0) @@ -51,11 +51,6 @@ public: return nested_func->getName() + "If"; } - DataTypePtr getReturnType() const override - { - return nested_func->getReturnType(); - } - const IAggregateFunction & getBaseAggregateFunctionWithSameStateRepresentation() const override { return nested_func->getBaseAggregateFunctionWithSameStateRepresentation(); diff --git a/src/AggregateFunctions/AggregateFunctionIntervalLengthSum.h b/src/AggregateFunctions/AggregateFunctionIntervalLengthSum.h index fdde50074aa..5b01da66364 100644 --- a/src/AggregateFunctions/AggregateFunctionIntervalLengthSum.h +++ b/src/AggregateFunctions/AggregateFunctionIntervalLengthSum.h @@ -177,11 +177,11 @@ public: String getName() const override { return "intervalLengthSum"; } explicit AggregateFunctionIntervalLengthSum(const DataTypes & arguments) - : IAggregateFunctionDataHelper>(arguments, {}) + : IAggregateFunctionDataHelper>(arguments, {}, createResultType()) { } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { if constexpr (std::is_floating_point_v) return std::make_shared(); diff --git a/src/AggregateFunctions/AggregateFunctionMLMethod.h b/src/AggregateFunctions/AggregateFunctionMLMethod.h index b9d5d835f57..6545ee4fd53 100644 --- a/src/AggregateFunctions/AggregateFunctionMLMethod.h +++ b/src/AggregateFunctions/AggregateFunctionMLMethod.h @@ -309,7 +309,7 @@ public: UInt64 batch_size_, const DataTypes & arguments_types, const Array & params) - : IAggregateFunctionDataHelper>(arguments_types, params) + : IAggregateFunctionDataHelper>(arguments_types, params, createResultType()) , param_num(param_num_) , learning_rate(learning_rate_) , l2_reg_coef(l2_reg_coef_) @@ -319,8 +319,7 @@ public: { } - /// This function is called when SELECT linearRegression(...) is called - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared(std::make_shared()); } diff --git a/src/AggregateFunctions/AggregateFunctionMannWhitney.h b/src/AggregateFunctions/AggregateFunctionMannWhitney.h index d861eef10ab..6176d6854fc 100644 --- a/src/AggregateFunctions/AggregateFunctionMannWhitney.h +++ b/src/AggregateFunctions/AggregateFunctionMannWhitney.h @@ -133,7 +133,7 @@ private: public: explicit AggregateFunctionMannWhitney(const DataTypes & arguments, const Array & params) - :IAggregateFunctionDataHelper ({arguments}, {}) + : IAggregateFunctionDataHelper ({arguments}, {}, createResultType()) { if (params.size() > 2) throw Exception("Aggregate function " + getName() + " require two parameter or less", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); @@ -174,7 +174,7 @@ public: bool allocatesMemoryInArena() const override { return true; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { DataTypes types { diff --git a/src/AggregateFunctions/AggregateFunctionMap.h b/src/AggregateFunctions/AggregateFunctionMap.h index f60cc71e78e..dc19bf3f71c 100644 --- a/src/AggregateFunctions/AggregateFunctionMap.h +++ b/src/AggregateFunctions/AggregateFunctionMap.h @@ -18,6 +18,7 @@ #include #include #include +#include "DataTypes/Serializations/ISerialization.h" #include "base/types.h" #include #include "AggregateFunctions/AggregateFunctionFactory.h" @@ -104,26 +105,32 @@ public: return nested_func->getDefaultVersion(); } - AggregateFunctionMap(AggregateFunctionPtr nested, const DataTypes & types) : Base(types, nested->getParameters()), nested_func(nested) + AggregateFunctionMap(AggregateFunctionPtr nested, const DataTypes & types) + : Base(types, nested->getParameters(), std::make_shared(DataTypes{getKeyType(types, nested), nested->getResultType()})) + , nested_func(nested) { - if (types.empty()) - throw Exception( - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Aggregate function " + getName() + " requires at least one argument"); - - if (types.size() > 1) - throw Exception( - ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Aggregate function " + getName() + " requires only one map argument"); - - const auto * map_type = checkAndGetDataType(types[0].get()); - if (!map_type) - throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregate function " + getName() + " requires map as argument"); - - key_type = map_type->getKeyType(); + key_type = getKeyType(types, nested_func); } String getName() const override { return nested_func->getName() + "Map"; } - DataTypePtr getReturnType() const override { return std::make_shared(DataTypes{key_type, nested_func->getReturnType()}); } + static DataTypePtr getKeyType(const DataTypes & types, const AggregateFunctionPtr & nested) + { + if (types.empty()) + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, + "Aggregate function {}Map requires at least one argument", nested->getName()); + + if (types.size() > 1) + throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, + "Aggregate function {}Map requires only one map argument", nested->getName()); + + const auto * map_type = checkAndGetDataType(types[0].get()); + if (!map_type) + throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, + "Aggregate function {}Map requires map as argument", nested->getName()); + + return map_type->getKeyType(); + } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override { diff --git a/src/AggregateFunctions/AggregateFunctionMaxIntersections.h b/src/AggregateFunctions/AggregateFunctionMaxIntersections.h index d2f553172c9..e78684c9491 100644 --- a/src/AggregateFunctions/AggregateFunctionMaxIntersections.h +++ b/src/AggregateFunctions/AggregateFunctionMaxIntersections.h @@ -62,7 +62,8 @@ private: public: AggregateFunctionIntersectionsMax(AggregateFunctionIntersectionsKind kind_, const DataTypes & arguments) - : IAggregateFunctionDataHelper, AggregateFunctionIntersectionsMax>(arguments, {}), kind(kind_) + : IAggregateFunctionDataHelper, AggregateFunctionIntersectionsMax>(arguments, {}, createResultType(kind_)) + , kind(kind_) { if (!isNativeNumber(arguments[0])) throw Exception{getName() + ": first argument must be represented by integer", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT}; @@ -81,9 +82,9 @@ public: : "maxIntersectionsPosition"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(AggregateFunctionIntersectionsKind kind_) { - if (kind == AggregateFunctionIntersectionsKind::Count) + if (kind_ == AggregateFunctionIntersectionsKind::Count) return std::make_shared(); else return std::make_shared>(); diff --git a/src/AggregateFunctions/AggregateFunctionMeanZTest.h b/src/AggregateFunctions/AggregateFunctionMeanZTest.h index 7fecff591e6..97925d4e07c 100644 --- a/src/AggregateFunctions/AggregateFunctionMeanZTest.h +++ b/src/AggregateFunctions/AggregateFunctionMeanZTest.h @@ -36,7 +36,7 @@ private: public: AggregateFunctionMeanZTest(const DataTypes & arguments, const Array & params) - : IAggregateFunctionDataHelper>({arguments}, params) + : IAggregateFunctionDataHelper>({arguments}, params, createResultType()) { pop_var_x = params.at(0).safeGet(); pop_var_y = params.at(1).safeGet(); @@ -63,7 +63,7 @@ public: return Data::name; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { DataTypes types { diff --git a/src/AggregateFunctions/AggregateFunctionMerge.h b/src/AggregateFunctions/AggregateFunctionMerge.h index bb2d36eeed1..0cb44259816 100644 --- a/src/AggregateFunctions/AggregateFunctionMerge.h +++ b/src/AggregateFunctions/AggregateFunctionMerge.h @@ -30,7 +30,7 @@ private: public: AggregateFunctionMerge(const AggregateFunctionPtr & nested_, const DataTypePtr & argument, const Array & params_) - : IAggregateFunctionHelper({argument}, params_) + : IAggregateFunctionHelper({argument}, params_, createResultType(nested_)) , nested_func(nested_) { const DataTypeAggregateFunction * data_type = typeid_cast(argument.get()); @@ -45,9 +45,9 @@ public: return nested_func->getName() + "Merge"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const AggregateFunctionPtr & nested_) { - return nested_func->getReturnType(); + return nested_->getResultType(); } const IAggregateFunction & getBaseAggregateFunctionWithSameStateRepresentation() const override diff --git a/src/AggregateFunctions/AggregateFunctionMinMaxAny.h b/src/AggregateFunctions/AggregateFunctionMinMaxAny.h index a6013f37b9d..314e68f83d9 100644 --- a/src/AggregateFunctions/AggregateFunctionMinMaxAny.h +++ b/src/AggregateFunctions/AggregateFunctionMinMaxAny.h @@ -1222,7 +1222,7 @@ private: public: explicit AggregateFunctionsSingleValue(const DataTypePtr & type) - : IAggregateFunctionDataHelper>({type}, {}) + : IAggregateFunctionDataHelper>({type}, {}, createResultType(type)) , serialization(type->getDefaultSerialization()) { if (StringRef(Data::name()) == StringRef("min") @@ -1236,12 +1236,11 @@ public: String getName() const override { return Data::name(); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const DataTypePtr & type_) { - auto result_type = this->argument_types.at(0); if constexpr (Data::is_nullable) - return makeNullable(result_type); - return result_type; + return makeNullable(type_); + return type_; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override diff --git a/src/AggregateFunctions/AggregateFunctionNothing.h b/src/AggregateFunctions/AggregateFunctionNothing.h index 13ef407be8b..de8a5868e04 100644 --- a/src/AggregateFunctions/AggregateFunctionNothing.h +++ b/src/AggregateFunctions/AggregateFunctionNothing.h @@ -6,6 +6,7 @@ #include #include #include +#include "DataTypes/IDataType.h" namespace DB @@ -19,16 +20,16 @@ class AggregateFunctionNothing final : public IAggregateFunctionHelper(arguments, params) {} + : IAggregateFunctionHelper(arguments, params, createResultType(arguments)) {} String getName() const override { return "nothing"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const DataTypes & arguments) { - return argument_types.empty() ? std::make_shared(std::make_shared()) : argument_types.front(); + return arguments.empty() ? std::make_shared(std::make_shared()) : arguments.front(); } bool allocatesMemoryInArena() const override { return false; } diff --git a/src/AggregateFunctions/AggregateFunctionNull.cpp b/src/AggregateFunctions/AggregateFunctionNull.cpp index 01558b56667..3f1912a4062 100644 --- a/src/AggregateFunctions/AggregateFunctionNull.cpp +++ b/src/AggregateFunctions/AggregateFunctionNull.cpp @@ -87,7 +87,7 @@ public: transformed_nested_function->getParameters()); } - bool return_type_is_nullable = !properties.returns_default_when_only_null && nested_function->getReturnType()->canBeInsideNullable(); + bool return_type_is_nullable = !properties.returns_default_when_only_null && nested_function->getResultType()->canBeInsideNullable(); bool serialize_flag = return_type_is_nullable || properties.returns_default_when_only_null; if (arguments.size() == 1) diff --git a/src/AggregateFunctions/AggregateFunctionNull.h b/src/AggregateFunctions/AggregateFunctionNull.h index 26d36b84860..ae5573a5351 100644 --- a/src/AggregateFunctions/AggregateFunctionNull.h +++ b/src/AggregateFunctions/AggregateFunctionNull.h @@ -85,7 +85,8 @@ protected: public: AggregateFunctionNullBase(AggregateFunctionPtr nested_function_, const DataTypes & arguments, const Array & params) - : IAggregateFunctionHelper(arguments, params), nested_function{nested_function_} + : IAggregateFunctionHelper(arguments, params, createResultType(nested_function_)) + , nested_function{nested_function_} { if constexpr (result_is_nullable) prefix_size = nested_function->alignOfData(); @@ -99,12 +100,12 @@ public: return nested_function->getName(); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const AggregateFunctionPtr & nested_function_) { if constexpr (result_is_nullable) - return makeNullable(nested_function->getReturnType()); + return makeNullable(nested_function_->getResultType()); else - return nested_function->getReturnType(); + return nested_function_->getResultType(); } void create(AggregateDataPtr __restrict place) const override @@ -275,7 +276,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, this->getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); llvm::Value * result = nullptr; diff --git a/src/AggregateFunctions/AggregateFunctionOrFill.h b/src/AggregateFunctions/AggregateFunctionOrFill.h index eff4fb2bdc0..eeec630be9a 100644 --- a/src/AggregateFunctions/AggregateFunctionOrFill.h +++ b/src/AggregateFunctions/AggregateFunctionOrFill.h @@ -30,16 +30,14 @@ private: AggregateFunctionPtr nested_function; size_t size_of_data; - DataTypePtr inner_type; bool inner_nullable; public: AggregateFunctionOrFill(AggregateFunctionPtr nested_function_, const DataTypes & arguments, const Array & params) - : IAggregateFunctionHelper{arguments, params} + : IAggregateFunctionHelper{arguments, params, createResultType(nested_function_->getResultType())} , nested_function{nested_function_} , size_of_data {nested_function->sizeOfData()} - , inner_type {nested_function->getReturnType()} - , inner_nullable {inner_type->isNullable()} + , inner_nullable {nested_function->getResultType()->isNullable()} { // nothing } @@ -246,22 +244,22 @@ public: readChar(place[size_of_data], buf); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const DataTypePtr & inner_type_) { if constexpr (UseNull) { // -OrNull - if (inner_nullable) - return inner_type; + if (inner_type_->isNullable()) + return inner_type_; - return std::make_shared(inner_type); + return std::make_shared(inner_type_); } else { // -OrDefault - return inner_type; + return inner_type_; } } diff --git a/src/AggregateFunctions/AggregateFunctionQuantile.h b/src/AggregateFunctions/AggregateFunctionQuantile.h index 39a9e09dc64..6427d03f089 100644 --- a/src/AggregateFunctions/AggregateFunctionQuantile.h +++ b/src/AggregateFunctions/AggregateFunctionQuantile.h @@ -72,7 +72,7 @@ private: public: AggregateFunctionQuantile(const DataTypes & argument_types_, const Array & params) : IAggregateFunctionDataHelper>( - argument_types_, params) + argument_types_, params, createResultType(argument_types_)) , levels(params, returns_many) , level(levels.levels[0]) , argument_type(this->argument_types[0]) @@ -83,14 +83,14 @@ public: String getName() const override { return Name::name; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const DataTypes & argument_types_) { DataTypePtr res; if constexpr (returns_float) res = std::make_shared>(); else - res = argument_type; + res = argument_types_[0]; if constexpr (returns_many) return std::make_shared(res); diff --git a/src/AggregateFunctions/AggregateFunctionRankCorrelation.h b/src/AggregateFunctions/AggregateFunctionRankCorrelation.h index 4a81c6cda82..4f9ca55f9f5 100644 --- a/src/AggregateFunctions/AggregateFunctionRankCorrelation.h +++ b/src/AggregateFunctions/AggregateFunctionRankCorrelation.h @@ -51,7 +51,7 @@ class AggregateFunctionRankCorrelation : { public: explicit AggregateFunctionRankCorrelation(const DataTypes & arguments) - :IAggregateFunctionDataHelper ({arguments}, {}) + :IAggregateFunctionDataHelper ({arguments}, {}, std::make_shared>()) {} String getName() const override @@ -61,11 +61,6 @@ public: bool allocatesMemoryInArena() const override { return true; } - DataTypePtr getReturnType() const override - { - return std::make_shared>(); - } - void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override { Float64 new_x = columns[0]->getFloat64(row_num); diff --git a/src/AggregateFunctions/AggregateFunctionResample.h b/src/AggregateFunctions/AggregateFunctionResample.h index fe04ada1a77..32458557ac5 100644 --- a/src/AggregateFunctions/AggregateFunctionResample.h +++ b/src/AggregateFunctions/AggregateFunctionResample.h @@ -43,7 +43,7 @@ public: size_t step_, const DataTypes & arguments, const Array & params) - : IAggregateFunctionHelper>{arguments, params} + : IAggregateFunctionHelper>{arguments, params, createResultType(nested_function_)} , nested_function{nested_function_} , last_col{arguments.size() - 1} , begin{begin_} @@ -190,9 +190,9 @@ public: nested_function->deserialize(place + i * size_of_data, buf, version, arena); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const AggregateFunctionPtr & nested_function_) { - return std::make_shared(nested_function->getReturnType()); + return std::make_shared(nested_function_->getResultType()); } template diff --git a/src/AggregateFunctions/AggregateFunctionRetention.h b/src/AggregateFunctions/AggregateFunctionRetention.h index 18d04fb1ea4..744b6d18f97 100644 --- a/src/AggregateFunctions/AggregateFunctionRetention.h +++ b/src/AggregateFunctions/AggregateFunctionRetention.h @@ -76,7 +76,7 @@ public: } explicit AggregateFunctionRetention(const DataTypes & arguments) - : IAggregateFunctionDataHelper(arguments, {}) + : IAggregateFunctionDataHelper(arguments, {}, std::make_shared(std::make_shared())) { for (const auto i : collections::range(0, arguments.size())) { @@ -90,12 +90,6 @@ public: events_size = static_cast(arguments.size()); } - - DataTypePtr getReturnType() const override - { - return std::make_shared(std::make_shared()); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, const size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionSequenceMatch.h b/src/AggregateFunctions/AggregateFunctionSequenceMatch.h index bcea408d26b..b4889a06e53 100644 --- a/src/AggregateFunctions/AggregateFunctionSequenceMatch.h +++ b/src/AggregateFunctions/AggregateFunctionSequenceMatch.h @@ -126,8 +126,8 @@ template class AggregateFunctionSequenceBase : public IAggregateFunctionDataHelper { public: - AggregateFunctionSequenceBase(const DataTypes & arguments, const Array & params, const String & pattern_) - : IAggregateFunctionDataHelper(arguments, params) + AggregateFunctionSequenceBase(const DataTypes & arguments, const Array & params, const String & pattern_, const DataTypePtr & result_type_) + : IAggregateFunctionDataHelper(arguments, params, result_type_) , pattern(pattern_) { arg_count = arguments.size(); @@ -617,14 +617,12 @@ class AggregateFunctionSequenceMatch final : public AggregateFunctionSequenceBas { public: AggregateFunctionSequenceMatch(const DataTypes & arguments, const Array & params, const String & pattern_) - : AggregateFunctionSequenceBase>(arguments, params, pattern_) {} + : AggregateFunctionSequenceBase>(arguments, params, pattern_, std::make_shared()) {} using AggregateFunctionSequenceBase>::AggregateFunctionSequenceBase; String getName() const override { return "sequenceMatch"; } - DataTypePtr getReturnType() const override { return std::make_shared(); } - bool allocatesMemoryInArena() const override { return false; } void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override @@ -655,14 +653,12 @@ class AggregateFunctionSequenceCount final : public AggregateFunctionSequenceBas { public: AggregateFunctionSequenceCount(const DataTypes & arguments, const Array & params, const String & pattern_) - : AggregateFunctionSequenceBase>(arguments, params, pattern_) {} + : AggregateFunctionSequenceBase>(arguments, params, pattern_, std::make_shared()) {} using AggregateFunctionSequenceBase>::AggregateFunctionSequenceBase; String getName() const override { return "sequenceCount"; } - DataTypePtr getReturnType() const override { return std::make_shared(); } - bool allocatesMemoryInArena() const override { return false; } void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h index 90caaee4d94..487889a0ca4 100644 --- a/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h +++ b/src/AggregateFunctions/AggregateFunctionSequenceNextNode.h @@ -190,7 +190,7 @@ public: SequenceDirection seq_direction_, size_t min_required_args_, UInt64 max_elems_ = std::numeric_limits::max()) - : IAggregateFunctionDataHelper, Self>({data_type_}, parameters_) + : IAggregateFunctionDataHelper, Self>({data_type_}, parameters_, data_type_) , seq_base_kind(seq_base_kind_) , seq_direction(seq_direction_) , min_required_args(min_required_args_) @@ -202,8 +202,6 @@ public: String getName() const override { return "sequenceNextNode"; } - DataTypePtr getReturnType() const override { return data_type; } - bool haveSameStateRepresentationImpl(const IAggregateFunction & rhs) const override { return this->getName() == rhs.getName() && this->haveEqualArgumentTypes(rhs); diff --git a/src/AggregateFunctions/AggregateFunctionSimpleLinearRegression.h b/src/AggregateFunctions/AggregateFunctionSimpleLinearRegression.h index 06cdfc5e582..b0d448afb55 100644 --- a/src/AggregateFunctions/AggregateFunctionSimpleLinearRegression.h +++ b/src/AggregateFunctions/AggregateFunctionSimpleLinearRegression.h @@ -99,7 +99,7 @@ public: IAggregateFunctionDataHelper< AggregateFunctionSimpleLinearRegressionData, AggregateFunctionSimpleLinearRegression - > {arguments, params} + > {arguments, params, createResultType()} { // notice: arguments has been checked before } @@ -140,7 +140,7 @@ public: this->data(place).deserialize(buf); } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { DataTypes types { diff --git a/src/AggregateFunctions/AggregateFunctionSimpleState.h b/src/AggregateFunctions/AggregateFunctionSimpleState.h index f50c86c684e..3af7d71395a 100644 --- a/src/AggregateFunctions/AggregateFunctionSimpleState.h +++ b/src/AggregateFunctions/AggregateFunctionSimpleState.h @@ -20,28 +20,28 @@ private: public: AggregateFunctionSimpleState(AggregateFunctionPtr nested_, const DataTypes & arguments_, const Array & params_) - : IAggregateFunctionHelper(arguments_, params_) + : IAggregateFunctionHelper(arguments_, params_, createResultType(nested_, params_)) , nested_func(nested_) { } String getName() const override { return nested_func->getName() + "SimpleState"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const AggregateFunctionPtr & nested_, const Array & params_) { - DataTypeCustomSimpleAggregateFunction::checkSupportedFunctions(nested_func); + DataTypeCustomSimpleAggregateFunction::checkSupportedFunctions(nested_); // Need to make a clone to avoid recursive reference. - auto storage_type_out = DataTypeFactory::instance().get(nested_func->getReturnType()->getName()); + auto storage_type_out = DataTypeFactory::instance().get(nested_->getResultType()->getName()); // Need to make a new function with promoted argument types because SimpleAggregates requires arg_type = return_type. AggregateFunctionProperties properties; auto function - = AggregateFunctionFactory::instance().get(nested_func->getName(), {storage_type_out}, nested_func->getParameters(), properties); + = AggregateFunctionFactory::instance().get(nested_->getName(), {storage_type_out}, nested_->getParameters(), properties); // Need to make a clone because it'll be customized. - auto storage_type_arg = DataTypeFactory::instance().get(nested_func->getReturnType()->getName()); + auto storage_type_arg = DataTypeFactory::instance().get(nested_->getResultType()->getName()); DataTypeCustomNamePtr custom_name - = std::make_unique(function, DataTypes{nested_func->getReturnType()}, parameters); + = std::make_unique(function, DataTypes{nested_->getResultType()}, params_); storage_type_arg->setCustomization(std::make_unique(std::move(custom_name), nullptr)); return storage_type_arg; } diff --git a/src/AggregateFunctions/AggregateFunctionSparkbar.h b/src/AggregateFunctions/AggregateFunctionSparkbar.h index f0fbdd2f2e4..882575e2005 100644 --- a/src/AggregateFunctions/AggregateFunctionSparkbar.h +++ b/src/AggregateFunctions/AggregateFunctionSparkbar.h @@ -261,7 +261,7 @@ private: public: AggregateFunctionSparkbar(const DataTypes & arguments, const Array & params) : IAggregateFunctionDataHelper, AggregateFunctionSparkbar>( - arguments, params) + arguments, params, std::make_shared()) { width = params.at(0).safeGet(); if (params.size() == 3) @@ -283,11 +283,6 @@ public: return "sparkbar"; } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * /*arena*/) const override { X x = assert_cast *>(columns[0])->getData()[row_num]; diff --git a/src/AggregateFunctions/AggregateFunctionState.h b/src/AggregateFunctions/AggregateFunctionState.h index 20ccb2e543c..625fe1f36bc 100644 --- a/src/AggregateFunctions/AggregateFunctionState.h +++ b/src/AggregateFunctions/AggregateFunctionState.h @@ -23,7 +23,7 @@ private: public: AggregateFunctionState(AggregateFunctionPtr nested_, const DataTypes & arguments_, const Array & params_) - : IAggregateFunctionHelper(arguments_, params_) + : IAggregateFunctionHelper(arguments_, params_, nested_->getStateType()) , nested_func(nested_) {} @@ -32,11 +32,6 @@ public: return nested_func->getName() + "State"; } - DataTypePtr getReturnType() const override - { - return getStateType(); - } - const IAggregateFunction & getBaseAggregateFunctionWithSameStateRepresentation() const override { return nested_func->getBaseAggregateFunctionWithSameStateRepresentation(); diff --git a/src/AggregateFunctions/AggregateFunctionStatistics.h b/src/AggregateFunctions/AggregateFunctionStatistics.h index ad7177a32fa..eb2d66b7e94 100644 --- a/src/AggregateFunctions/AggregateFunctionStatistics.h +++ b/src/AggregateFunctions/AggregateFunctionStatistics.h @@ -115,15 +115,11 @@ class AggregateFunctionVariance final { public: explicit AggregateFunctionVariance(const DataTypePtr & arg) - : IAggregateFunctionDataHelper, AggregateFunctionVariance>({arg}, {}) {} + : IAggregateFunctionDataHelper, AggregateFunctionVariance>({arg}, {}, std::make_shared()) + {} String getName() const override { return Op::name; } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override @@ -368,15 +364,11 @@ class AggregateFunctionCovariance final public: explicit AggregateFunctionCovariance(const DataTypes & args) : IAggregateFunctionDataHelper< CovarianceData, - AggregateFunctionCovariance>(args, {}) {} + AggregateFunctionCovariance>(args, {}, std::make_shared()) + {} String getName() const override { return Op::name; } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionStatisticsSimple.h b/src/AggregateFunctions/AggregateFunctionStatisticsSimple.h index d57b043b491..9ef62363a75 100644 --- a/src/AggregateFunctions/AggregateFunctionStatisticsSimple.h +++ b/src/AggregateFunctions/AggregateFunctionStatisticsSimple.h @@ -81,12 +81,12 @@ public: using ColVecResult = ColumnVector; explicit AggregateFunctionVarianceSimple(const DataTypes & argument_types_) - : IAggregateFunctionDataHelper>(argument_types_, {}) + : IAggregateFunctionDataHelper>(argument_types_, {}, std::make_shared>()) , src_scale(0) {} AggregateFunctionVarianceSimple(const IDataType & data_type, const DataTypes & argument_types_) - : IAggregateFunctionDataHelper>(argument_types_, {}) + : IAggregateFunctionDataHelper>(argument_types_, {}, std::make_shared>()) , src_scale(getDecimalScale(data_type)) {} @@ -117,11 +117,6 @@ public: UNREACHABLE(); } - DataTypePtr getReturnType() const override - { - return std::make_shared>(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionSum.h b/src/AggregateFunctions/AggregateFunctionSum.h index 4cd0afc8760..14c2838c30d 100644 --- a/src/AggregateFunctions/AggregateFunctionSum.h +++ b/src/AggregateFunctions/AggregateFunctionSum.h @@ -411,23 +411,21 @@ public: } explicit AggregateFunctionSum(const DataTypes & argument_types_) - : IAggregateFunctionDataHelper>(argument_types_, {}) - , scale(0) + : IAggregateFunctionDataHelper>(argument_types_, {}, createResultType(0)) {} AggregateFunctionSum(const IDataType & data_type, const DataTypes & argument_types_) - : IAggregateFunctionDataHelper>(argument_types_, {}) - , scale(getDecimalScale(data_type)) + : IAggregateFunctionDataHelper>(argument_types_, {}, createResultType(getDecimalScale(data_type))) {} - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(UInt32 scale_) { if constexpr (!is_decimal) return std::make_shared>(); else { using DataType = DataTypeDecimal; - return std::make_shared(DataType::maxPrecision(), scale); + return std::make_shared(DataType::maxPrecision(), scale_); } } @@ -548,7 +546,7 @@ public: for (const auto & argument_type : this->argument_types) can_be_compiled &= canBeNativeType(*argument_type); - auto return_type = getReturnType(); + auto return_type = this->getResultType(); can_be_compiled &= canBeNativeType(*return_type); return can_be_compiled; @@ -558,7 +556,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * aggregate_sum_ptr = aggregate_data_ptr; b.CreateStore(llvm::Constant::getNullValue(return_type), aggregate_sum_ptr); @@ -568,7 +566,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * sum_value_ptr = aggregate_data_ptr; auto * sum_value = b.CreateLoad(return_type, sum_value_ptr); @@ -586,7 +584,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * sum_value_dst_ptr = aggregate_data_dst_ptr; auto * sum_value_dst = b.CreateLoad(return_type, sum_value_dst_ptr); @@ -602,7 +600,7 @@ public: { llvm::IRBuilder<> & b = static_cast &>(builder); - auto * return_type = toNativeType(b, getReturnType()); + auto * return_type = toNativeType(b, this->getResultType()); auto * sum_value_ptr = aggregate_data_ptr; return b.CreateLoad(return_type, sum_value_ptr); @@ -611,8 +609,6 @@ public: #endif private: - UInt32 scale; - static constexpr auto & castColumnToResult(IColumn & to) { if constexpr (is_decimal) diff --git a/src/AggregateFunctions/AggregateFunctionSumCount.h b/src/AggregateFunctions/AggregateFunctionSumCount.h index f1a5d85bb6c..7058204ed74 100644 --- a/src/AggregateFunctions/AggregateFunctionSumCount.h +++ b/src/AggregateFunctions/AggregateFunctionSumCount.h @@ -14,12 +14,13 @@ public: using Base = AggregateFunctionAvg; explicit AggregateFunctionSumCount(const DataTypes & argument_types_, UInt32 num_scale_ = 0) - : Base(argument_types_, num_scale_), scale(num_scale_) {} + : Base(argument_types_, createResultType(num_scale_), num_scale_) + {} - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(UInt32 num_scale_) { auto second_elem = std::make_shared(); - return std::make_shared(DataTypes{getReturnTypeFirstElement(), std::move(second_elem)}); + return std::make_shared(DataTypes{getReturnTypeFirstElement(num_scale_), std::move(second_elem)}); } void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena *) const final @@ -43,9 +44,7 @@ public: #endif private: - UInt32 scale; - - auto getReturnTypeFirstElement() const + static auto getReturnTypeFirstElement(UInt32 num_scale_) { using FieldType = AvgFieldType; @@ -54,7 +53,7 @@ private: else { using DataType = DataTypeDecimal; - return std::make_shared(DataType::maxPrecision(), scale); + return std::make_shared(DataType::maxPrecision(), num_scale_); } } }; diff --git a/src/AggregateFunctions/AggregateFunctionSumMap.h b/src/AggregateFunctions/AggregateFunctionSumMap.h index 1e32be987ff..ee6b1525741 100644 --- a/src/AggregateFunctions/AggregateFunctionSumMap.h +++ b/src/AggregateFunctions/AggregateFunctionSumMap.h @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -80,7 +81,7 @@ public: AggregateFunctionMapBase(const DataTypePtr & keys_type_, const DataTypes & values_types_, const DataTypes & argument_types_) - : Base(argument_types_, {} /* parameters */) + : Base(argument_types_, {} /* parameters */, createResultType(keys_type_, values_types_, getName())) , keys_type(keys_type_) , keys_serialization(keys_type->getDefaultSerialization()) , values_types(values_types_) @@ -117,19 +118,22 @@ public: return 0; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType( + const DataTypePtr & keys_type_, + const DataTypes & values_types_, + const String & name_) { DataTypes types; - types.emplace_back(std::make_shared(keys_type)); + types.emplace_back(std::make_shared(keys_type_)); - for (const auto & value_type : values_types) + for (const auto & value_type : values_types_) { if constexpr (std::is_same_v) { if (!value_type->isSummable()) throw Exception{ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Values for {} cannot be summed, passed type {}", - getName(), value_type->getName()}; + name_, value_type->getName()}; } DataTypePtr result_type; @@ -139,7 +143,7 @@ public: if (value_type->onlyNull()) throw Exception{ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Cannot calculate {} of type {}", - getName(), value_type->getName()}; + name_, value_type->getName()}; // Overflow, meaning that the returned type is the same as // the input type. Nulls are skipped. @@ -153,7 +157,7 @@ public: if (!value_type_without_nullable->canBePromoted()) throw Exception{ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Values for {} are expected to be Numeric, Float or Decimal, passed type {}", - getName(), value_type->getName()}; + name_, value_type->getName()}; WhichDataType value_type_to_check(value_type_without_nullable); @@ -424,7 +428,10 @@ public: } bool keepKey(const T & key) const { return static_cast(*this).keepKey(key); } - String getName() const override { return static_cast(*this).getName(); } + String getName() const override { return getNameImpl(); } + +private: + static String getNameImpl() { return Derived::getNameImpl(); } }; template @@ -443,10 +450,10 @@ public: { // The constructor accepts parameters to have a uniform interface with // sumMapFiltered, but this function doesn't have any parameters. - assertNoParameters(getName(), params_); + assertNoParameters(getNameImpl(), params_); } - String getName() const override + static String getNameImpl() { if constexpr (overflow) { @@ -487,13 +494,13 @@ public: if (params_.size() != 1) throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "Aggregate function '{}' requires exactly one parameter " - "of Array type", getName()); + "of Array type", getNameImpl()); Array keys_to_keep_values; if (!params_.front().tryGet(keys_to_keep_values)) throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregate function {} requires an Array as a parameter", - getName()); + getNameImpl()); keys_to_keep.reserve(keys_to_keep_values.size()); @@ -501,7 +508,7 @@ public: keys_to_keep.emplace(f.safeGet()); } - String getName() const override + static String getNameImpl() { return overflow ? "sumMapFilteredWithOverflow" : "sumMapFiltered"; } bool keepKey(const T & key) const { return keys_to_keep.count(key); } @@ -606,10 +613,10 @@ public: { // The constructor accepts parameters to have a uniform interface with // sumMapFiltered, but this function doesn't have any parameters. - assertNoParameters(getName(), params_); + assertNoParameters(getNameImpl(), params_); } - String getName() const override { return "minMap"; } + static String getNameImpl() { return "minMap"; } bool keepKey(const T &) const { return true; } }; @@ -630,10 +637,10 @@ public: { // The constructor accepts parameters to have a uniform interface with // sumMapFiltered, but this function doesn't have any parameters. - assertNoParameters(getName(), params_); + assertNoParameters(getNameImpl(), params_); } - String getName() const override { return "maxMap"; } + static String getNameImpl() { return "maxMap"; } bool keepKey(const T &) const { return true; } }; diff --git a/src/AggregateFunctions/AggregateFunctionTTest.h b/src/AggregateFunctions/AggregateFunctionTTest.h index b72e7a3cdcb..749e711d4f7 100644 --- a/src/AggregateFunctions/AggregateFunctionTTest.h +++ b/src/AggregateFunctions/AggregateFunctionTTest.h @@ -46,7 +46,7 @@ private: Float64 confidence_level; public: AggregateFunctionTTest(const DataTypes & arguments, const Array & params) - : IAggregateFunctionDataHelper>({arguments}, params) + : IAggregateFunctionDataHelper>({arguments}, params, createResultType(!params.empty())) { if (!params.empty()) { @@ -71,9 +71,9 @@ public: return Data::name; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(bool need_confidence_interval_) { - if (need_confidence_interval) + if (need_confidence_interval_) { DataTypes types { diff --git a/src/AggregateFunctions/AggregateFunctionTopK.cpp b/src/AggregateFunctions/AggregateFunctionTopK.cpp index 4ebc80aceb5..b93aa703503 100644 --- a/src/AggregateFunctions/AggregateFunctionTopK.cpp +++ b/src/AggregateFunctions/AggregateFunctionTopK.cpp @@ -31,15 +31,33 @@ namespace template class AggregateFunctionTopKDate : public AggregateFunctionTopK { +public: using AggregateFunctionTopK::AggregateFunctionTopK; - DataTypePtr getReturnType() const override { return std::make_shared(std::make_shared()); } + + AggregateFunctionTopKDate(UInt64 threshold_, UInt64 load_factor, const DataTypes & argument_types_, const Array & params) + : AggregateFunctionTopK( + threshold_, + load_factor, + argument_types_, + params, + std::make_shared(std::make_shared())) + {} }; template class AggregateFunctionTopKDateTime : public AggregateFunctionTopK { +public: using AggregateFunctionTopK::AggregateFunctionTopK; - DataTypePtr getReturnType() const override { return std::make_shared(std::make_shared()); } + + AggregateFunctionTopKDateTime(UInt64 threshold_, UInt64 load_factor, const DataTypes & argument_types_, const Array & params) + : AggregateFunctionTopK( + threshold_, + load_factor, + argument_types_, + params, + std::make_shared(std::make_shared())) + {} }; diff --git a/src/AggregateFunctions/AggregateFunctionTopK.h b/src/AggregateFunctions/AggregateFunctionTopK.h index 98774254695..f1e57608195 100644 --- a/src/AggregateFunctions/AggregateFunctionTopK.h +++ b/src/AggregateFunctions/AggregateFunctionTopK.h @@ -40,14 +40,20 @@ protected: public: AggregateFunctionTopK(UInt64 threshold_, UInt64 load_factor, const DataTypes & argument_types_, const Array & params) - : IAggregateFunctionDataHelper, AggregateFunctionTopK>(argument_types_, params) - , threshold(threshold_), reserved(load_factor * threshold) {} + : IAggregateFunctionDataHelper, AggregateFunctionTopK>(argument_types_, params, createResultType(argument_types_)) + , threshold(threshold_), reserved(load_factor * threshold) + {} + + AggregateFunctionTopK(UInt64 threshold_, UInt64 load_factor, const DataTypes & argument_types_, const Array & params, const DataTypePtr & result_type_) + : IAggregateFunctionDataHelper, AggregateFunctionTopK>(argument_types_, params, result_type_) + , threshold(threshold_), reserved(load_factor * threshold) + {} String getName() const override { return is_weighted ? "topKWeighted" : "topK"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const DataTypes & argument_types_) { - return std::make_shared(this->argument_types[0]); + return std::make_shared(argument_types_[0]); } bool allocatesMemoryInArena() const override { return false; } @@ -126,21 +132,20 @@ private: UInt64 threshold; UInt64 reserved; - DataTypePtr & input_data_type; static void deserializeAndInsert(StringRef str, IColumn & data_to); public: AggregateFunctionTopKGeneric( UInt64 threshold_, UInt64 load_factor, const DataTypes & argument_types_, const Array & params) - : IAggregateFunctionDataHelper>(argument_types_, params) - , threshold(threshold_), reserved(load_factor * threshold), input_data_type(this->argument_types[0]) {} + : IAggregateFunctionDataHelper>(argument_types_, params, createResultType(argument_types_)) + , threshold(threshold_), reserved(load_factor * threshold) {} String getName() const override { return is_weighted ? "topKWeighted" : "topK"; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType(const DataTypes & argument_types_) { - return std::make_shared(input_data_type); + return std::make_shared(argument_types_[0]); } bool allocatesMemoryInArena() const override diff --git a/src/AggregateFunctions/AggregateFunctionUniq.h b/src/AggregateFunctions/AggregateFunctionUniq.h index 1a98bfc8456..c782b9314fd 100644 --- a/src/AggregateFunctions/AggregateFunctionUniq.h +++ b/src/AggregateFunctions/AggregateFunctionUniq.h @@ -358,17 +358,12 @@ private: public: explicit AggregateFunctionUniq(const DataTypes & argument_types_) - : IAggregateFunctionDataHelper>(argument_types_, {}) + : IAggregateFunctionDataHelper>(argument_types_, {}, std::make_shared()) { } String getName() const override { return Data::getName(); } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } /// ALWAYS_INLINE is required to have better code layout for uniqHLL12 function @@ -462,7 +457,7 @@ private: public: explicit AggregateFunctionUniqVariadic(const DataTypes & arguments) - : IAggregateFunctionDataHelper>(arguments, {}) + : IAggregateFunctionDataHelper>(arguments, {}, std::make_shared()) { if (argument_is_tuple) num_args = typeid_cast(*arguments[0]).getElements().size(); @@ -472,11 +467,6 @@ public: String getName() const override { return Data::getName(); } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionUniqCombined.h b/src/AggregateFunctions/AggregateFunctionUniqCombined.h index 47b3081225b..d879e3b3dde 100644 --- a/src/AggregateFunctions/AggregateFunctionUniqCombined.h +++ b/src/AggregateFunctions/AggregateFunctionUniqCombined.h @@ -126,7 +126,8 @@ class AggregateFunctionUniqCombined final { public: AggregateFunctionUniqCombined(const DataTypes & argument_types_, const Array & params_) - : IAggregateFunctionDataHelper, AggregateFunctionUniqCombined>(argument_types_, params_) {} + : IAggregateFunctionDataHelper, AggregateFunctionUniqCombined>(argument_types_, params_, std::make_shared()) + {} String getName() const override { @@ -136,11 +137,6 @@ public: return "uniqCombined"; } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override @@ -192,7 +188,7 @@ private: public: explicit AggregateFunctionUniqCombinedVariadic(const DataTypes & arguments, const Array & params) : IAggregateFunctionDataHelper, - AggregateFunctionUniqCombinedVariadic>(arguments, params) + AggregateFunctionUniqCombinedVariadic>(arguments, params, std::make_shared()) { if (argument_is_tuple) num_args = typeid_cast(*arguments[0]).getElements().size(); @@ -208,11 +204,6 @@ public: return "uniqCombined"; } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionUniqUpTo.h b/src/AggregateFunctions/AggregateFunctionUniqUpTo.h index 99f36b664d7..377f2580070 100644 --- a/src/AggregateFunctions/AggregateFunctionUniqUpTo.h +++ b/src/AggregateFunctions/AggregateFunctionUniqUpTo.h @@ -174,7 +174,7 @@ private: public: AggregateFunctionUniqUpTo(UInt8 threshold_, const DataTypes & argument_types_, const Array & params_) - : IAggregateFunctionDataHelper, AggregateFunctionUniqUpTo>(argument_types_, params_) + : IAggregateFunctionDataHelper, AggregateFunctionUniqUpTo>(argument_types_, params_, std::make_shared()) , threshold(threshold_) { } @@ -186,11 +186,6 @@ public: String getName() const override { return "uniqUpTo"; } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } /// ALWAYS_INLINE is required to have better code layout for uniqUpTo function @@ -235,7 +230,7 @@ private: public: AggregateFunctionUniqUpToVariadic(const DataTypes & arguments, const Array & params, UInt8 threshold_) - : IAggregateFunctionDataHelper, AggregateFunctionUniqUpToVariadic>(arguments, params) + : IAggregateFunctionDataHelper, AggregateFunctionUniqUpToVariadic>(arguments, params, std::make_shared()) , threshold(threshold_) { if (argument_is_tuple) @@ -251,11 +246,6 @@ public: String getName() const override { return "uniqUpTo"; } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/AggregateFunctionWindowFunnel.h b/src/AggregateFunctions/AggregateFunctionWindowFunnel.h index 8dad9643da5..472f230a24c 100644 --- a/src/AggregateFunctions/AggregateFunctionWindowFunnel.h +++ b/src/AggregateFunctions/AggregateFunctionWindowFunnel.h @@ -221,7 +221,7 @@ public: } AggregateFunctionWindowFunnel(const DataTypes & arguments, const Array & params) - : IAggregateFunctionDataHelper>(arguments, params) + : IAggregateFunctionDataHelper>(arguments, params, std::make_shared()) { events_size = arguments.size() - 1; window = params.at(0).safeGet(); @@ -245,11 +245,6 @@ public: } } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void add(AggregateDataPtr __restrict place, const IColumn ** columns, const size_t row_num, Arena *) const override diff --git a/src/AggregateFunctions/CrossTab.h b/src/AggregateFunctions/CrossTab.h index 1284c210886..5868292c83f 100644 --- a/src/AggregateFunctions/CrossTab.h +++ b/src/AggregateFunctions/CrossTab.h @@ -118,7 +118,7 @@ class AggregateFunctionCrossTab : public IAggregateFunctionDataHelper>({arguments}, {}) + : IAggregateFunctionDataHelper>({arguments}, {}, createResultType()) { } @@ -132,7 +132,7 @@ public: return false; } - DataTypePtr getReturnType() const override + static DataTypePtr createResultType() { return std::make_shared>(); } diff --git a/src/AggregateFunctions/IAggregateFunction.h b/src/AggregateFunctions/IAggregateFunction.h index ada00791e69..1df5b67b155 100644 --- a/src/AggregateFunctions/IAggregateFunction.h +++ b/src/AggregateFunctions/IAggregateFunction.h @@ -10,6 +10,7 @@ #include #include #include +#include #include "config.h" @@ -49,6 +50,7 @@ using ConstAggregateDataPtr = const char *; class IAggregateFunction; using AggregateFunctionPtr = std::shared_ptr; + struct AggregateFunctionProperties; /** Aggregate functions interface. @@ -59,18 +61,18 @@ struct AggregateFunctionProperties; * (which can be created in some memory pool), * and IAggregateFunction is the external interface for manipulating them. */ -class IAggregateFunction : public std::enable_shared_from_this +class IAggregateFunction : public std::enable_shared_from_this, public IResolvedFunction { public: - IAggregateFunction(const DataTypes & argument_types_, const Array & parameters_) - : argument_types(argument_types_), parameters(parameters_) {} + IAggregateFunction(const DataTypes & argument_types_, const Array & parameters_, const DataTypePtr & result_type_) + : result_type(result_type_) + , argument_types(argument_types_) + , parameters(parameters_) + {} /// Get main function name. virtual String getName() const = 0; - /// Get the result type. - virtual DataTypePtr getReturnType() const = 0; - /// Get the data type of internal state. By default it is AggregateFunction(name(params), argument_types...). virtual DataTypePtr getStateType() const; @@ -102,7 +104,7 @@ public: virtual size_t getDefaultVersion() const { return 0; } - virtual ~IAggregateFunction() = default; + ~IAggregateFunction() override = default; /** Data manipulating functions. */ @@ -348,8 +350,9 @@ public: */ virtual AggregateFunctionPtr getNestedFunction() const { return {}; } - const DataTypes & getArgumentTypes() const { return argument_types; } - const Array & getParameters() const { return parameters; } + const DataTypePtr & getResultType() const override { return result_type; } + const DataTypes & getArgumentTypes() const override { return argument_types; } + const Array & getParameters() const override { return parameters; } // Any aggregate function can be calculated over a window, but there are some // window functions such as rank() that require a different interface, e.g. @@ -398,6 +401,7 @@ public: #endif protected: + DataTypePtr result_type; DataTypes argument_types; Array parameters; }; @@ -414,8 +418,8 @@ private: } public: - IAggregateFunctionHelper(const DataTypes & argument_types_, const Array & parameters_) - : IAggregateFunction(argument_types_, parameters_) {} + IAggregateFunctionHelper(const DataTypes & argument_types_, const Array & parameters_, const DataTypePtr & result_type_) + : IAggregateFunction(argument_types_, parameters_, result_type_) {} AddFunc getAddressOfAddFunction() const override { return &addFree; } @@ -695,15 +699,15 @@ public: // Derived class can `override` this to flag that DateTime64 is not supported. static constexpr bool DateTime64Supported = true; - IAggregateFunctionDataHelper(const DataTypes & argument_types_, const Array & parameters_) - : IAggregateFunctionHelper(argument_types_, parameters_) + IAggregateFunctionDataHelper(const DataTypes & argument_types_, const Array & parameters_, const DataTypePtr & result_type_) + : IAggregateFunctionHelper(argument_types_, parameters_, result_type_) { /// To prevent derived classes changing the destroy() without updating hasTrivialDestructor() to match it /// Enforce that either both of them are changed or none are - constexpr bool declares_destroy_and_hasTrivialDestructor = + constexpr bool declares_destroy_and_has_trivial_destructor = std::is_same_v == std::is_same_v; - static_assert(declares_destroy_and_hasTrivialDestructor, + static_assert(declares_destroy_and_has_trivial_destructor, "destroy() and hasTrivialDestructor() methods of an aggregate function must be either both overridden or not"); } diff --git a/src/Analyzer/FunctionNode.cpp b/src/Analyzer/FunctionNode.cpp index ad3959dfe9c..1b32cd5436d 100644 --- a/src/Analyzer/FunctionNode.cpp +++ b/src/Analyzer/FunctionNode.cpp @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -17,6 +18,11 @@ namespace DB { +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; +} + FunctionNode::FunctionNode(String function_name_) : IQueryTreeNode(children_size) , function_name(function_name_) @@ -25,25 +31,41 @@ FunctionNode::FunctionNode(String function_name_) children[arguments_child_index] = std::make_shared(); } -void FunctionNode::resolveAsFunction(FunctionOverloadResolverPtr function_value, DataTypePtr result_type_value) +ColumnsWithTypeAndName FunctionNode::getArgumentTypes() const { - aggregate_function = nullptr; + ColumnsWithTypeAndName argument_types; + for (const auto & arg : getArguments().getNodes()) + { + ColumnWithTypeAndName argument; + argument.type = arg->getResultType(); + if (auto * constant = arg->as()) + argument.column = argument.type->createColumnConst(1, constant->getValue()); + argument_types.push_back(argument); + } + return argument_types; +} + +void FunctionNode::resolveAsFunction(FunctionBasePtr function_value) +{ + function_name = function_value->getName(); function = std::move(function_value); - result_type = std::move(result_type_value); - function_name = function->getName(); + kind = FunctionKind::ORDINARY; } -void FunctionNode::resolveAsAggregateFunction(AggregateFunctionPtr aggregate_function_value, DataTypePtr result_type_value) +void FunctionNode::resolveAsAggregateFunction(AggregateFunctionPtr aggregate_function_value) { - function = nullptr; - aggregate_function = std::move(aggregate_function_value); - result_type = std::move(result_type_value); - function_name = aggregate_function->getName(); + function_name = aggregate_function_value->getName(); + function = std::move(aggregate_function_value); + kind = FunctionKind::AGGREGATE; } -void FunctionNode::resolveAsWindowFunction(AggregateFunctionPtr window_function_value, DataTypePtr result_type_value) +void FunctionNode::resolveAsWindowFunction(AggregateFunctionPtr window_function_value) { - resolveAsAggregateFunction(window_function_value, result_type_value); + if (!hasWindow()) + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Trying to resolve FunctionNode without window definition as a window function {}", window_function_value->getName()); + resolveAsAggregateFunction(window_function_value); + kind = FunctionKind::WINDOW; } void FunctionNode::dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state, size_t indent) const @@ -63,8 +85,8 @@ void FunctionNode::dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state buffer << ", function_type: " << function_type; - if (result_type) - buffer << ", result_type: " + result_type->getName(); + if (function) + buffer << ", result_type: " + function->getResultType()->getName(); const auto & parameters = getParameters(); if (!parameters.getNodes().empty()) @@ -96,11 +118,19 @@ bool FunctionNode::isEqualImpl(const IQueryTreeNode & rhs) const isWindowFunction() != rhs_typed.isWindowFunction()) return false; - if (result_type && rhs_typed.result_type && !result_type->equals(*rhs_typed.getResultType())) + if (isResolved() != rhs_typed.isResolved()) return false; - else if (result_type && !rhs_typed.result_type) + if (!isResolved()) + return true; + + auto lhs_result_type = getResultType(); + auto rhs_result_type = rhs.getResultType(); + + if (lhs_result_type && rhs_result_type && !lhs_result_type->equals(*rhs_result_type)) return false; - else if (!result_type && rhs_typed.result_type) + else if (lhs_result_type && !rhs_result_type) + return false; + else if (!lhs_result_type && rhs_result_type) return false; return true; @@ -114,7 +144,10 @@ void FunctionNode::updateTreeHashImpl(HashState & hash_state) const hash_state.update(isAggregateFunction()); hash_state.update(isWindowFunction()); - if (result_type) + if (!isResolved()) + return; + + if (auto result_type = getResultType()) { auto result_type_name = result_type->getName(); hash_state.update(result_type_name.size()); @@ -130,8 +163,7 @@ QueryTreeNodePtr FunctionNode::cloneImpl() const * because ordinary functions or aggregate functions must be stateless. */ result_function->function = function; - result_function->aggregate_function = aggregate_function; - result_function->result_type = result_type; + result_function->kind = kind; return result_function; } diff --git a/src/Analyzer/FunctionNode.h b/src/Analyzer/FunctionNode.h index e746cf48581..501d439e55e 100644 --- a/src/Analyzer/FunctionNode.h +++ b/src/Analyzer/FunctionNode.h @@ -1,8 +1,12 @@ #pragma once +#include +#include #include #include #include +#include +#include namespace DB { @@ -15,6 +19,9 @@ namespace ErrorCodes class IFunctionOverloadResolver; using FunctionOverloadResolverPtr = std::shared_ptr; +class IFunctionBase; +using FunctionBasePtr = std::shared_ptr; + class IAggregateFunction; using AggregateFunctionPtr = std::shared_ptr; @@ -39,6 +46,14 @@ using AggregateFunctionPtr = std::shared_ptr; class FunctionNode; using FunctionNodePtr = std::shared_ptr; +enum class FunctionKind +{ + UNKNOWN, + ORDINARY, + AGGREGATE, + WINDOW, +}; + class FunctionNode final : public IQueryTreeNode { public: @@ -101,6 +116,8 @@ public: return children[arguments_child_index]; } + ColumnsWithTypeAndName getArgumentTypes() const; + /// Returns true if function node has window, false otherwise bool hasWindow() const { @@ -129,42 +146,46 @@ public: /** Get non aggregate function. * If function is not resolved nullptr returned. */ - const FunctionOverloadResolverPtr & getFunction() const + FunctionBasePtr getFunction() const { - return function; + if (kind != FunctionKind::ORDINARY) + return {}; + return std::reinterpret_pointer_cast(function); } /** Get aggregate function. * If function is not resolved nullptr returned. * If function is resolved as non aggregate function nullptr returned. */ - const AggregateFunctionPtr & getAggregateFunction() const + AggregateFunctionPtr getAggregateFunction() const { - return aggregate_function; + if (kind == FunctionKind::UNKNOWN || kind == FunctionKind::ORDINARY) + return {}; + return std::reinterpret_pointer_cast(function); } /// Is function node resolved bool isResolved() const { - return result_type != nullptr && (function != nullptr || aggregate_function != nullptr); + return function != nullptr; } /// Is function node window function bool isWindowFunction() const { - return getWindowNode() != nullptr; + return hasWindow(); } /// Is function node aggregate function bool isAggregateFunction() const { - return aggregate_function != nullptr && !isWindowFunction(); + return kind == FunctionKind::AGGREGATE; } /// Is function node ordinary function bool isOrdinaryFunction() const { - return function != nullptr; + return kind == FunctionKind::ORDINARY; } /** Resolve function node as non aggregate function. @@ -173,19 +194,19 @@ public: * Assume we have `multiIf` function with single condition, it can be converted to `if` function. * Function name must be updated accordingly. */ - void resolveAsFunction(FunctionOverloadResolverPtr function_value, DataTypePtr result_type_value); + void resolveAsFunction(FunctionBasePtr function_value); /** Resolve function node as aggregate function. * It is important that function name is updated with resolved function name. * Main motivation for this is query tree optimizations. */ - void resolveAsAggregateFunction(AggregateFunctionPtr aggregate_function_value, DataTypePtr result_type_value); + void resolveAsAggregateFunction(AggregateFunctionPtr aggregate_function_value); /** Resolve function node as window function. * It is important that function name is updated with resolved function name. * Main motivation for this is query tree optimizations. */ - void resolveAsWindowFunction(AggregateFunctionPtr window_function_value, DataTypePtr result_type_value); + void resolveAsWindowFunction(AggregateFunctionPtr window_function_value); QueryTreeNodeType getNodeType() const override { @@ -194,12 +215,11 @@ public: DataTypePtr getResultType() const override { - if (!result_type) + if (!function) throw Exception(ErrorCodes::UNSUPPORTED_METHOD, "Function node with name '{}' is not resolved", function_name); - - return result_type; + return function->getResultType(); } void dumpTreeImpl(WriteBuffer & buffer, FormatState & format_state, size_t indent) const override; @@ -215,9 +235,8 @@ protected: private: String function_name; - FunctionOverloadResolverPtr function; - AggregateFunctionPtr aggregate_function; - DataTypePtr result_type; + FunctionKind kind = FunctionKind::UNKNOWN; + IResolvedFunctionPtr function; static constexpr size_t parameters_child_index = 0; static constexpr size_t arguments_child_index = 1; diff --git a/src/Analyzer/Passes/AggregateFunctionsArithmericOperationsPass.cpp b/src/Analyzer/Passes/AggregateFunctionsArithmericOperationsPass.cpp index 9b59faacfe0..e4e99c6e947 100644 --- a/src/Analyzer/Passes/AggregateFunctionsArithmericOperationsPass.cpp +++ b/src/Analyzer/Passes/AggregateFunctionsArithmericOperationsPass.cpp @@ -147,7 +147,6 @@ public: private: static inline void resolveAggregateFunctionNode(FunctionNode & function_node, const String & aggregate_function_name) { - auto function_result_type = function_node.getResultType(); auto function_aggregate_function = function_node.getAggregateFunction(); AggregateFunctionProperties properties; @@ -156,7 +155,7 @@ private: function_aggregate_function->getParameters(), properties); - function_node.resolveAsAggregateFunction(std::move(aggregate_function), std::move(function_result_type)); + function_node.resolveAsAggregateFunction(std::move(aggregate_function)); } }; diff --git a/src/Analyzer/Passes/CountDistinctPass.cpp b/src/Analyzer/Passes/CountDistinctPass.cpp index 05c31ec28ba..0384055e484 100644 --- a/src/Analyzer/Passes/CountDistinctPass.cpp +++ b/src/Analyzer/Passes/CountDistinctPass.cpp @@ -71,7 +71,7 @@ public: auto result_type = function_node->getResultType(); AggregateFunctionProperties properties; auto aggregate_function = AggregateFunctionFactory::instance().get("count", {}, {}, properties); - function_node->resolveAsAggregateFunction(std::move(aggregate_function), std::move(result_type)); + function_node->resolveAsAggregateFunction(std::move(aggregate_function)); function_node->getArguments().getNodes().clear(); } }; diff --git a/src/Analyzer/Passes/CustomizeFunctionsPass.cpp b/src/Analyzer/Passes/CustomizeFunctionsPass.cpp index 629ab411a55..7eb4a040970 100644 --- a/src/Analyzer/Passes/CustomizeFunctionsPass.cpp +++ b/src/Analyzer/Passes/CustomizeFunctionsPass.cpp @@ -138,7 +138,6 @@ public: static inline void resolveAggregateOrWindowFunctionNode(FunctionNode & function_node, const String & aggregate_function_name) { - auto function_result_type = function_node.getResultType(); auto function_aggregate_function = function_node.getAggregateFunction(); AggregateFunctionProperties properties; @@ -148,16 +147,15 @@ public: properties); if (function_node.isAggregateFunction()) - function_node.resolveAsAggregateFunction(std::move(aggregate_function), std::move(function_result_type)); + function_node.resolveAsAggregateFunction(std::move(aggregate_function)); else if (function_node.isWindowFunction()) - function_node.resolveAsWindowFunction(std::move(aggregate_function), std::move(function_result_type)); + function_node.resolveAsWindowFunction(std::move(aggregate_function)); } inline void resolveOrdinaryFunctionNode(FunctionNode & function_node, const String & function_name) const { - auto function_result_type = function_node.getResultType(); auto function = FunctionFactory::instance().get(function_name, context); - function_node.resolveAsFunction(function, std::move(function_result_type)); + function_node.resolveAsFunction(function->build(function_node.getArgumentTypes())); } private: diff --git a/src/Analyzer/Passes/FunctionToSubcolumnsPass.cpp b/src/Analyzer/Passes/FunctionToSubcolumnsPass.cpp index b1ecfe2d8fc..0c5a450135f 100644 --- a/src/Analyzer/Passes/FunctionToSubcolumnsPass.cpp +++ b/src/Analyzer/Passes/FunctionToSubcolumnsPass.cpp @@ -78,11 +78,11 @@ public: column.name += ".size0"; column.type = std::make_shared(); - resolveOrdinaryFunctionNode(*function_node, "equals"); - function_arguments_nodes.clear(); function_arguments_nodes.push_back(std::make_shared(column, column_source)); function_arguments_nodes.push_back(std::make_shared(static_cast(0))); + + resolveOrdinaryFunctionNode(*function_node, "equals"); } else if (function_name == "notEmpty") { @@ -90,11 +90,11 @@ public: column.name += ".size0"; column.type = std::make_shared(); - resolveOrdinaryFunctionNode(*function_node, "notEquals"); - function_arguments_nodes.clear(); function_arguments_nodes.push_back(std::make_shared(column, column_source)); function_arguments_nodes.push_back(std::make_shared(static_cast(0))); + + resolveOrdinaryFunctionNode(*function_node, "notEquals"); } } else if (column_type.isNullable()) @@ -112,9 +112,9 @@ public: column.name += ".null"; column.type = std::make_shared(); - resolveOrdinaryFunctionNode(*function_node, "not"); - function_arguments_nodes = {std::make_shared(column, column_source)}; + + resolveOrdinaryFunctionNode(*function_node, "not"); } } else if (column_type.isMap()) @@ -182,9 +182,9 @@ public: column.type = data_type_map.getKeyType(); auto has_function_argument = std::make_shared(column, column_source); - resolveOrdinaryFunctionNode(*function_node, "has"); - function_arguments_nodes[0] = std::move(has_function_argument); + + resolveOrdinaryFunctionNode(*function_node, "has"); } } } @@ -192,9 +192,8 @@ public: private: inline void resolveOrdinaryFunctionNode(FunctionNode & function_node, const String & function_name) const { - auto function_result_type = function_node.getResultType(); auto function = FunctionFactory::instance().get(function_name, context); - function_node.resolveAsFunction(function, std::move(function_result_type)); + function_node.resolveAsFunction(function->build(function_node.getArgumentTypes())); } ContextPtr & context; diff --git a/src/Analyzer/Passes/FuseFunctionsPass.cpp b/src/Analyzer/Passes/FuseFunctionsPass.cpp index f7e703cdaa4..f354a7b1ec3 100644 --- a/src/Analyzer/Passes/FuseFunctionsPass.cpp +++ b/src/Analyzer/Passes/FuseFunctionsPass.cpp @@ -59,14 +59,13 @@ private: std::unordered_set names_to_collect; }; -QueryTreeNodePtr createResolvedFunction(const ContextPtr & context, const String & name, const DataTypePtr & result_type, QueryTreeNodes arguments) +QueryTreeNodePtr createResolvedFunction(const ContextPtr & context, const String & name, QueryTreeNodes arguments) { auto function_node = std::make_shared(name); auto function = FunctionFactory::instance().get(name, context); - function_node->resolveAsFunction(std::move(function), result_type); function_node->getArguments().getNodes() = std::move(arguments); - + function_node->resolveAsFunction(function->build(function_node->getArgumentTypes())); return function_node; } @@ -74,11 +73,6 @@ FunctionNodePtr createResolvedAggregateFunction(const String & name, const Query { auto function_node = std::make_shared(name); - AggregateFunctionProperties properties; - auto aggregate_function = AggregateFunctionFactory::instance().get(name, {argument->getResultType()}, parameters, properties); - function_node->resolveAsAggregateFunction(aggregate_function, aggregate_function->getReturnType()); - function_node->getArguments().getNodes() = { argument }; - if (!parameters.empty()) { QueryTreeNodes parameter_nodes; @@ -86,18 +80,27 @@ FunctionNodePtr createResolvedAggregateFunction(const String & name, const Query parameter_nodes.emplace_back(std::make_shared(param)); function_node->getParameters().getNodes() = std::move(parameter_nodes); } + function_node->getArguments().getNodes() = { argument }; + + AggregateFunctionProperties properties; + auto aggregate_function = AggregateFunctionFactory::instance().get( + name, + { argument->getResultType() }, + parameters, + properties); + function_node->resolveAsAggregateFunction(aggregate_function); return function_node; } -QueryTreeNodePtr createTupleElementFunction(const ContextPtr & context, const DataTypePtr & result_type, QueryTreeNodePtr argument, UInt64 index) +QueryTreeNodePtr createTupleElementFunction(const ContextPtr & context, QueryTreeNodePtr argument, UInt64 index) { - return createResolvedFunction(context, "tupleElement", result_type, {std::move(argument), std::make_shared(index)}); + return createResolvedFunction(context, "tupleElement", {argument, std::make_shared(index)}); } -QueryTreeNodePtr createArrayElementFunction(const ContextPtr & context, const DataTypePtr & result_type, QueryTreeNodePtr argument, UInt64 index) +QueryTreeNodePtr createArrayElementFunction(const ContextPtr & context, QueryTreeNodePtr argument, UInt64 index) { - return createResolvedFunction(context, "arrayElement", result_type, {std::move(argument), std::make_shared(index)}); + return createResolvedFunction(context, "arrayElement", {argument, std::make_shared(index)}); } void replaceWithSumCount(QueryTreeNodePtr & node, const FunctionNodePtr & sum_count_node, ContextPtr context) @@ -115,20 +118,20 @@ void replaceWithSumCount(QueryTreeNodePtr & node, const FunctionNodePtr & sum_co if (function_name == "sum") { assert(node->getResultType()->equals(*sum_count_result_type->getElement(0))); - node = createTupleElementFunction(context, node->getResultType(), sum_count_node, 1); + node = createTupleElementFunction(context, sum_count_node, 1); } else if (function_name == "count") { assert(node->getResultType()->equals(*sum_count_result_type->getElement(1))); - node = createTupleElementFunction(context, node->getResultType(), sum_count_node, 2); + node = createTupleElementFunction(context, sum_count_node, 2); } else if (function_name == "avg") { - auto sum_result = createTupleElementFunction(context, sum_count_result_type->getElement(0), sum_count_node, 1); - auto count_result = createTupleElementFunction(context, sum_count_result_type->getElement(1), sum_count_node, 2); + auto sum_result = createTupleElementFunction(context, sum_count_node, 1); + auto count_result = createTupleElementFunction(context, sum_count_node, 2); /// To avoid integer division by zero - auto count_float_result = createResolvedFunction(context, "toFloat64", std::make_shared(), {count_result}); - node = createResolvedFunction(context, "divide", node->getResultType(), {sum_result, count_float_result}); + auto count_float_result = createResolvedFunction(context, "toFloat64", {count_result}); + node = createResolvedFunction(context, "divide", {sum_result, count_float_result}); } else { @@ -238,7 +241,7 @@ void tryFuseQuantiles(QueryTreeNodePtr query_tree_node, ContextPtr context) for (size_t i = 0; i < nodes_set.size(); ++i) { size_t array_index = i + 1; - *nodes[i] = createArrayElementFunction(context, result_array_type->getNestedType(), quantiles_node, array_index); + *nodes[i] = createArrayElementFunction(context, quantiles_node, array_index); } } } diff --git a/src/Analyzer/Passes/IfChainToMultiIfPass.cpp b/src/Analyzer/Passes/IfChainToMultiIfPass.cpp index f400b11765e..020edfe4820 100644 --- a/src/Analyzer/Passes/IfChainToMultiIfPass.cpp +++ b/src/Analyzer/Passes/IfChainToMultiIfPass.cpp @@ -55,8 +55,8 @@ public: return; auto multi_if_function = std::make_shared("multiIf"); - multi_if_function->resolveAsFunction(multi_if_function_ptr, std::make_shared()); multi_if_function->getArguments().getNodes() = std::move(multi_if_arguments); + multi_if_function->resolveAsFunction(multi_if_function_ptr->build(multi_if_function->getArgumentTypes())); node = std::move(multi_if_function); } diff --git a/src/Analyzer/Passes/IfTransformStringsToEnumPass.cpp b/src/Analyzer/Passes/IfTransformStringsToEnumPass.cpp index 65120632c0c..776fe63c803 100644 --- a/src/Analyzer/Passes/IfTransformStringsToEnumPass.cpp +++ b/src/Analyzer/Passes/IfTransformStringsToEnumPass.cpp @@ -47,49 +47,64 @@ QueryTreeNodePtr createCastFunction(QueryTreeNodePtr from, DataTypePtr result_ty auto enum_literal_node = std::make_shared(std::move(enum_literal)); auto cast_function = FunctionFactory::instance().get("_CAST", std::move(context)); - QueryTreeNodes arguments{std::move(from), std::move(enum_literal_node)}; + QueryTreeNodes arguments{ std::move(from), std::move(enum_literal_node) }; auto function_node = std::make_shared("_CAST"); - function_node->resolveAsFunction(std::move(cast_function), std::move(result_type)); function_node->getArguments().getNodes() = std::move(arguments); + function_node->resolveAsFunction(cast_function->build(function_node->getArgumentTypes())); + return function_node; } /// if(arg1, arg2, arg3) will be transformed to if(arg1, _CAST(arg2, Enum...), _CAST(arg3, Enum...)) /// where Enum is generated based on the possible values stored in string_values void changeIfArguments( - QueryTreeNodePtr & first, QueryTreeNodePtr & second, const std::set & string_values, const ContextPtr & context) + FunctionNode & if_node, const std::set & string_values, const ContextPtr & context) { auto result_type = getEnumType(string_values); - first = createCastFunction(first, result_type, context); - second = createCastFunction(second, result_type, context); + auto & argument_nodes = if_node.getArguments().getNodes(); + + argument_nodes[1] = createCastFunction(argument_nodes[1], result_type, context); + argument_nodes[2] = createCastFunction(argument_nodes[2], result_type, context); + + auto if_resolver = FunctionFactory::instance().get("if", context); + + if_node.resolveAsFunction(if_resolver->build(if_node.getArgumentTypes())); } /// transform(value, array_from, array_to, default_value) will be transformed to transform(value, array_from, _CAST(array_to, Array(Enum...)), _CAST(default_value, Enum...)) /// where Enum is generated based on the possible values stored in string_values void changeTransformArguments( - QueryTreeNodePtr & array_to, - QueryTreeNodePtr & default_value, + FunctionNode & transform_node, const std::set & string_values, const ContextPtr & context) { auto result_type = getEnumType(string_values); + auto & arguments = transform_node.getArguments().getNodes(); + + auto & array_to = arguments[2]; + auto & default_value = arguments[3]; + array_to = createCastFunction(array_to, std::make_shared(result_type), context); default_value = createCastFunction(default_value, std::move(result_type), context); + + auto transform_resolver = FunctionFactory::instance().get("transform", context); + + transform_node.resolveAsFunction(transform_resolver->build(transform_node.getArgumentTypes())); } void wrapIntoToString(FunctionNode & function_node, QueryTreeNodePtr arg, ContextPtr context) { - assert(isString(function_node.getResultType())); - auto to_string_function = FunctionFactory::instance().get("toString", std::move(context)); - QueryTreeNodes arguments{std::move(arg)}; - - function_node.resolveAsFunction(std::move(to_string_function), std::make_shared()); + QueryTreeNodes arguments{ std::move(arg) }; function_node.getArguments().getNodes() = std::move(arguments); + + function_node.resolveAsFunction(to_string_function->build(function_node.getArgumentTypes())); + + assert(isString(function_node.getResultType())); } class ConvertStringsToEnumVisitor : public InDepthQueryTreeVisitor @@ -117,7 +132,8 @@ public: return; auto modified_if_node = function_node->clone(); - auto & argument_nodes = modified_if_node->as()->getArguments().getNodes(); + auto * function_if_node = modified_if_node->as(); + auto & argument_nodes = function_if_node->getArguments().getNodes(); const auto * first_literal = argument_nodes[1]->as(); const auto * second_literal = argument_nodes[2]->as(); @@ -132,7 +148,7 @@ public: string_values.insert(first_literal->getValue().get()); string_values.insert(second_literal->getValue().get()); - changeIfArguments(argument_nodes[1], argument_nodes[2], string_values, context); + changeIfArguments(*function_if_node, string_values, context); wrapIntoToString(*function_node, std::move(modified_if_node), context); return; } @@ -143,7 +159,8 @@ public: return; auto modified_transform_node = function_node->clone(); - auto & argument_nodes = modified_transform_node->as()->getArguments().getNodes(); + auto * function_modified_transform_node = modified_transform_node->as(); + auto & argument_nodes = function_modified_transform_node->getArguments().getNodes(); if (!isString(function_node->getResultType())) return; @@ -176,7 +193,7 @@ public: string_values.insert(literal_default->getValue().get()); - changeTransformArguments(argument_nodes[2], argument_nodes[3], string_values, context); + changeTransformArguments(*function_modified_transform_node, string_values, context); wrapIntoToString(*function_node, std::move(modified_transform_node), context); return; } diff --git a/src/Analyzer/Passes/MultiIfToIfPass.cpp b/src/Analyzer/Passes/MultiIfToIfPass.cpp index 6d2ebac33e6..7e13675bf98 100644 --- a/src/Analyzer/Passes/MultiIfToIfPass.cpp +++ b/src/Analyzer/Passes/MultiIfToIfPass.cpp @@ -27,7 +27,7 @@ public: return; auto result_type = function_node->getResultType(); - function_node->resolveAsFunction(if_function_ptr, std::move(result_type)); + function_node->resolveAsFunction(if_function_ptr->build(function_node->getArgumentTypes())); } private: diff --git a/src/Analyzer/Passes/NormalizeCountVariantsPass.cpp b/src/Analyzer/Passes/NormalizeCountVariantsPass.cpp index cd6aa4d76f4..3580b64497d 100644 --- a/src/Analyzer/Passes/NormalizeCountVariantsPass.cpp +++ b/src/Analyzer/Passes/NormalizeCountVariantsPass.cpp @@ -53,12 +53,10 @@ private: static inline void resolveAsCountAggregateFunction(FunctionNode & function_node) { - auto function_result_type = function_node.getResultType(); - AggregateFunctionProperties properties; auto aggregate_function = AggregateFunctionFactory::instance().get("count", {}, {}, properties); - function_node.resolveAsAggregateFunction(std::move(aggregate_function), std::move(function_result_type)); + function_node.resolveAsAggregateFunction(std::move(aggregate_function)); } }; diff --git a/src/Analyzer/Passes/QueryAnalysisPass.cpp b/src/Analyzer/Passes/QueryAnalysisPass.cpp index 6f56d6fca8e..c8a86a4f036 100644 --- a/src/Analyzer/Passes/QueryAnalysisPass.cpp +++ b/src/Analyzer/Passes/QueryAnalysisPass.cpp @@ -4302,7 +4302,7 @@ ProjectionNames QueryAnalyzer::resolveFunction(QueryTreeNodePtr & node, Identifi bool force_grouping_standard_compatibility = scope.context->getSettingsRef().force_grouping_standard_compatibility; auto grouping_function = std::make_shared(force_grouping_standard_compatibility); auto grouping_function_adaptor = std::make_shared(std::move(grouping_function)); - function_node.resolveAsFunction(std::move(grouping_function_adaptor), std::make_shared()); + function_node.resolveAsFunction(grouping_function_adaptor->build({})); return result_projection_names; } } @@ -4327,7 +4327,7 @@ ProjectionNames QueryAnalyzer::resolveFunction(QueryTreeNodePtr & node, Identifi AggregateFunctionProperties properties; auto aggregate_function = AggregateFunctionFactory::instance().get(function_name, argument_types, parameters, properties); - function_node.resolveAsWindowFunction(aggregate_function, aggregate_function->getReturnType()); + function_node.resolveAsWindowFunction(aggregate_function); bool window_node_is_identifier = function_node.getWindowNode()->getNodeType() == QueryTreeNodeType::IDENTIFIER; ProjectionName window_projection_name = resolveWindow(function_node.getWindowNode(), scope); @@ -4386,7 +4386,7 @@ ProjectionNames QueryAnalyzer::resolveFunction(QueryTreeNodePtr & node, Identifi AggregateFunctionProperties properties; auto aggregate_function = AggregateFunctionFactory::instance().get(function_name, argument_types, parameters, properties); - function_node.resolveAsAggregateFunction(aggregate_function, aggregate_function->getReturnType()); + function_node.resolveAsAggregateFunction(aggregate_function); return result_projection_names; } @@ -4563,6 +4563,8 @@ ProjectionNames QueryAnalyzer::resolveFunction(QueryTreeNodePtr & node, Identifi constant_value = std::make_shared(std::move(column_constant_value), result_type); } } + + function_node.resolveAsFunction(std::move(function_base)); } catch (Exception & e) { @@ -4570,8 +4572,6 @@ ProjectionNames QueryAnalyzer::resolveFunction(QueryTreeNodePtr & node, Identifi throw; } - function_node.resolveAsFunction(std::move(function), std::move(result_type)); - if (constant_value) node = std::make_shared(std::move(constant_value), node); diff --git a/src/Analyzer/Passes/SumIfToCountIfPass.cpp b/src/Analyzer/Passes/SumIfToCountIfPass.cpp index 91c277d35b3..1496d539d27 100644 --- a/src/Analyzer/Passes/SumIfToCountIfPass.cpp +++ b/src/Analyzer/Passes/SumIfToCountIfPass.cpp @@ -117,11 +117,12 @@ public: not_function_result_type = makeNullable(not_function_result_type); auto not_function = std::make_shared("not"); - not_function->resolveAsFunction(FunctionFactory::instance().get("not", context), std::move(not_function_result_type)); auto & not_function_arguments = not_function->getArguments().getNodes(); not_function_arguments.push_back(std::move(nested_if_function_arguments_nodes[0])); + not_function->resolveAsFunction(FunctionFactory::instance().get("not", context)->build(not_function->getArgumentTypes())); + function_node_arguments_nodes[0] = std::move(not_function); function_node_arguments_nodes.resize(1); @@ -139,8 +140,7 @@ private: function_node.getAggregateFunction()->getParameters(), properties); - auto function_result_type = function_node.getResultType(); - function_node.resolveAsAggregateFunction(std::move(aggregate_function), std::move(function_result_type)); + function_node.resolveAsAggregateFunction(std::move(aggregate_function)); } ContextPtr & context; diff --git a/src/Analyzer/Passes/UniqInjectiveFunctionsEliminationPass.cpp b/src/Analyzer/Passes/UniqInjectiveFunctionsEliminationPass.cpp index 1716c37228a..37bad70da57 100644 --- a/src/Analyzer/Passes/UniqInjectiveFunctionsEliminationPass.cpp +++ b/src/Analyzer/Passes/UniqInjectiveFunctionsEliminationPass.cpp @@ -76,7 +76,7 @@ public: properties); auto function_result_type = function_node->getResultType(); - function_node->resolveAsAggregateFunction(std::move(aggregate_function), std::move(function_result_type)); + function_node->resolveAsAggregateFunction(std::move(aggregate_function)); } }; diff --git a/src/Analyzer/QueryTreePassManager.cpp b/src/Analyzer/QueryTreePassManager.cpp index ca9d4e3d1e3..06a1fec4698 100644 --- a/src/Analyzer/QueryTreePassManager.cpp +++ b/src/Analyzer/QueryTreePassManager.cpp @@ -21,6 +21,7 @@ #include #include +#include #include #include @@ -44,6 +45,23 @@ namespace class ValidationChecker : public InDepthQueryTreeVisitor { String pass_name; + + void visitColumn(ColumnNode * column) const + { + if (column->getColumnSourceOrNull() == nullptr) + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Column {} {} query tree node does not have valid source node after running {} pass", + column->getColumnName(), column->getColumnType(), pass_name); + } + + void visitFunction(FunctionNode * function) const + { + if (!function->isResolved()) + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Function {} is not resolved after running {} pass", + function->dumpTree(), pass_name); + } + public: explicit ValidationChecker(String pass_name_) : pass_name(std::move(pass_name_)) @@ -51,13 +69,10 @@ public: void visitImpl(QueryTreeNodePtr & node) const { - auto * column = node->as(); - if (!column) - return; - if (column->getColumnSourceOrNull() == nullptr) - throw Exception(ErrorCodes::LOGICAL_ERROR, - "Column {} {} query tree node does not have valid source node after running {} pass", - column->getColumnName(), column->getColumnType(), pass_name); + if (auto * column = node->as()) + return visitColumn(column); + else if (auto * function = node->as()) + return visitFunction(function); } }; #endif diff --git a/src/Analyzer/SortNode.cpp b/src/Analyzer/SortNode.cpp index 3f91724e9b7..da1c52ff0ef 100644 --- a/src/Analyzer/SortNode.cpp +++ b/src/Analyzer/SortNode.cpp @@ -91,7 +91,8 @@ bool SortNode::isEqualImpl(const IQueryTreeNode & rhs) const void SortNode::updateTreeHashImpl(HashState & hash_state) const { hash_state.update(sort_direction); - hash_state.update(nulls_sort_direction); + /// use some determined value if `nulls_sort_direction` is `nullopt` + hash_state.update(nulls_sort_direction.value_or(sort_direction)); hash_state.update(with_fill); if (collator) diff --git a/src/Columns/ColumnAggregateFunction.cpp b/src/Columns/ColumnAggregateFunction.cpp index f51a0426199..58643f7a9b7 100644 --- a/src/Columns/ColumnAggregateFunction.cpp +++ b/src/Columns/ColumnAggregateFunction.cpp @@ -146,7 +146,7 @@ MutableColumnPtr ColumnAggregateFunction::convertToValues(MutableColumnPtr colum /// insertResultInto may invalidate states, so we must unshare ownership of them column_aggregate_func.ensureOwnership(); - MutableColumnPtr res = func->getReturnType()->createColumn(); + MutableColumnPtr res = func->getResultType()->createColumn(); res->reserve(data.size()); /// If there are references to states in final column, we must hold their ownership diff --git a/src/Columns/ColumnFunction.h b/src/Columns/ColumnFunction.h index 4781406c3b9..257bd1146fd 100644 --- a/src/Columns/ColumnFunction.h +++ b/src/Columns/ColumnFunction.h @@ -13,7 +13,7 @@ namespace ErrorCodes } class IFunctionBase; -using FunctionBasePtr = std::shared_ptr; +using FunctionBasePtr = std::shared_ptr; /** A column containing a lambda expression. * Behaves like a constant-column. Contains an expression, but not input or output data. diff --git a/src/Common/BinStringDecodeHelper.h b/src/Common/BinStringDecodeHelper.h new file mode 100644 index 00000000000..513a4196b6f --- /dev/null +++ b/src/Common/BinStringDecodeHelper.h @@ -0,0 +1,76 @@ +#pragma once + +#include + +namespace DB +{ + +static void inline hexStringDecode(const char * pos, const char * end, char *& out, size_t word_size = 2) +{ + if ((end - pos) & 1) + { + *out = unhex(*pos); + ++out; + ++pos; + } + while (pos < end) + { + *out = unhex2(pos); + pos += word_size; + ++out; + } + *out = '\0'; + ++out; +} + +static void inline binStringDecode(const char * pos, const char * end, char *& out) +{ + if (pos == end) + { + *out = '\0'; + ++out; + return; + } + + UInt8 left = 0; + + /// end - pos is the length of input. + /// (length & 7) to make remain bits length mod 8 is zero to split. + /// e.g. the length is 9 and the input is "101000001", + /// first left_cnt is 1, left is 0, right shift, pos is 1, left = 1 + /// then, left_cnt is 0, remain input is '01000001'. + for (UInt8 left_cnt = (end - pos) & 7; left_cnt > 0; --left_cnt) + { + left = left << 1; + if (*pos != '0') + left += 1; + ++pos; + } + + if (left != 0 || end - pos == 0) + { + *out = left; + ++out; + } + + assert((end - pos) % 8 == 0); + + while (end - pos != 0) + { + UInt8 c = 0; + for (UInt8 i = 0; i < 8; ++i) + { + c = c << 1; + if (*pos != '0') + c += 1; + ++pos; + } + *out = c; + ++out; + } + + *out = '\0'; + ++out; +} + +} diff --git a/src/Common/tests/gtest_rw_lock.cpp b/src/Common/tests/gtest_rw_lock.cpp index 6ba67a40445..57f446ca249 100644 --- a/src/Common/tests/gtest_rw_lock.cpp +++ b/src/Common/tests/gtest_rw_lock.cpp @@ -240,24 +240,52 @@ TEST(Common, RWLockPerfTestReaders) for (auto pool_size : pool_sizes) { - Stopwatch watch(CLOCK_MONOTONIC_COARSE); + Stopwatch watch(CLOCK_MONOTONIC_COARSE); - auto func = [&] () + auto func = [&] () + { + for (auto i = 0; i < cycles; ++i) { - for (auto i = 0; i < cycles; ++i) - { - auto lock = fifo_lock->getLock(RWLockImpl::Read, RWLockImpl::NO_QUERY); - } - }; + auto lock = fifo_lock->getLock(RWLockImpl::Read, RWLockImpl::NO_QUERY); + } + }; - std::list threads; - for (size_t thread = 0; thread < pool_size; ++thread) - threads.emplace_back(func); + std::list threads; + for (size_t thread = 0; thread < pool_size; ++thread) + threads.emplace_back(func); - for (auto & thread : threads) - thread.join(); + for (auto & thread : threads) + thread.join(); - auto total_time = watch.elapsedSeconds(); - std::cout << "Threads " << pool_size << ", total_time " << std::setprecision(2) << total_time << "\n"; + auto total_time = watch.elapsedSeconds(); + std::cout << "Threads " << pool_size << ", total_time " << std::setprecision(2) << total_time << "\n"; } } + +TEST(Common, RWLockNotUpgradeableWithNoQuery) +{ + updatePHDRCache(); + + static auto rw_lock = RWLockImpl::create(); + + std::thread read_thread([&] () + { + auto lock = rw_lock->getLock(RWLockImpl::Read, RWLockImpl::NO_QUERY, std::chrono::duration(50000)); + auto sleep_for = std::chrono::duration(5000); + std::this_thread::sleep_for(sleep_for); + }); + + { + auto sleep_for = std::chrono::duration(500); + std::this_thread::sleep_for(sleep_for); + + Stopwatch watch(CLOCK_MONOTONIC_COARSE); + auto get_lock = rw_lock->getLock(RWLockImpl::Write, RWLockImpl::NO_QUERY, std::chrono::duration(50000)); + + EXPECT_NE(get_lock.get(), nullptr); + /// It took some time + EXPECT_GT(watch.elapsedMilliseconds(), 3000); + } + + read_thread.join(); +} diff --git a/src/Core/IResolvedFunction.h b/src/Core/IResolvedFunction.h new file mode 100644 index 00000000000..64c69f597c7 --- /dev/null +++ b/src/Core/IResolvedFunction.h @@ -0,0 +1,29 @@ +#pragma once + +#include +#include + +namespace DB +{ +class IDataType; + +using DataTypePtr = std::shared_ptr; +using DataTypes = std::vector; + +struct Array; + +class IResolvedFunction +{ +public: + virtual const DataTypePtr & getResultType() const = 0; + + virtual const DataTypes & getArgumentTypes() const = 0; + + virtual const Array & getParameters() const = 0; + + virtual ~IResolvedFunction() = default; +}; + +using IResolvedFunctionPtr = std::shared_ptr; + +} diff --git a/src/Core/Settings.h b/src/Core/Settings.h index 0cbdcdf35fe..450304b2abd 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -620,6 +620,7 @@ static constexpr UInt64 operator""_GiB(unsigned long long value) M(Bool, enable_filesystem_cache_on_lower_level, true, "If read buffer supports caching inside threadpool, allow it to do it, otherwise cache outside ot threadpool. Do not use this setting, it is needed for testing", 0) \ M(Bool, skip_download_if_exceeds_query_cache, true, "Skip download from remote filesystem if exceeds query cache size", 0) \ M(UInt64, max_query_cache_size, (128UL * 1024 * 1024 * 1024), "Max remote filesystem cache size that can be used by a single query", 0) \ + M(Bool, throw_on_error_from_cache_on_write_operations, false, "Ignore error from cache when caching on write operations (INSERT, merges)", 0) \ \ M(Bool, load_marks_asynchronously, false, "Load MergeTree marks asynchronously", 0) \ \ diff --git a/src/Daemon/BaseDaemon.cpp b/src/Daemon/BaseDaemon.cpp index 604a882bccc..6cd952bfa83 100644 --- a/src/Daemon/BaseDaemon.cpp +++ b/src/Daemon/BaseDaemon.cpp @@ -1025,9 +1025,6 @@ void BaseDaemon::setupWatchdog() #if defined(OS_LINUX) if (0 != prctl(PR_SET_PDEATHSIG, SIGKILL)) logger().warning("Cannot do prctl to ask termination with parent."); - - if (getppid() == 1) - throw Poco::Exception("Parent watchdog process has exited."); #endif { diff --git a/src/DataTypes/DataTypeAggregateFunction.h b/src/DataTypes/DataTypeAggregateFunction.h index 4a92e6c5703..2d712d9c686 100644 --- a/src/DataTypes/DataTypeAggregateFunction.h +++ b/src/DataTypes/DataTypeAggregateFunction.h @@ -30,9 +30,9 @@ private: public: static constexpr bool is_parametric = true; - DataTypeAggregateFunction(const AggregateFunctionPtr & function_, const DataTypes & argument_types_, + DataTypeAggregateFunction(AggregateFunctionPtr function_, const DataTypes & argument_types_, const Array & parameters_, std::optional version_ = std::nullopt) - : function(function_) + : function(std::move(function_)) , argument_types(argument_types_) , parameters(parameters_) , version(version_) @@ -51,7 +51,7 @@ public: bool canBeInsideNullable() const override { return false; } - DataTypePtr getReturnType() const { return function->getReturnType(); } + DataTypePtr getReturnType() const { return function->getResultType(); } DataTypePtr getReturnTypeToPredict() const { return function->getReturnTypeToPredict(); } DataTypes getArgumentsDataTypes() const { return argument_types; } diff --git a/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp b/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp index efcab212094..c12f9de5a95 100644 --- a/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp +++ b/src/DataTypes/DataTypeCustomSimpleAggregateFunction.cpp @@ -131,9 +131,9 @@ static std::pair create(const ASTPtr & argum DataTypePtr storage_type = DataTypeFactory::instance().get(argument_types[0]->getName()); - if (!function->getReturnType()->equals(*removeLowCardinality(storage_type))) + if (!function->getResultType()->equals(*removeLowCardinality(storage_type))) { - throw Exception("Incompatible data types between aggregate function '" + function->getName() + "' which returns " + function->getReturnType()->getName() + " and column storage type " + storage_type->getName(), + throw Exception("Incompatible data types between aggregate function '" + function->getName() + "' which returns " + function->getResultType()->getName() + " and column storage type " + storage_type->getName(), ErrorCodes::BAD_ARGUMENTS); } diff --git a/src/Databases/DatabaseOrdinary.cpp b/src/Databases/DatabaseOrdinary.cpp index 01c6e5c8d8c..87f91856c1b 100644 --- a/src/Databases/DatabaseOrdinary.cpp +++ b/src/Databases/DatabaseOrdinary.cpp @@ -256,6 +256,9 @@ void DatabaseOrdinary::startupTables(ThreadPool & thread_pool, LoadingStrictness auto startup_one_table = [&](const StoragePtr & table) { + /// Since startup() method can use physical paths on disk we don't allow any exclusive actions (rename, drop so on) + /// until startup finished. + auto table_lock_holder = table->lockForShare(RWLockImpl::NO_QUERY, getContext()->getSettingsRef().lock_acquire_timeout); table->startup(); logAboutProgress(log, ++tables_processed, total_tables, watch); }; diff --git a/src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp b/src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp index 994bb743c5f..823e4125f06 100644 --- a/src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp +++ b/src/Disks/IO/CachedOnDiskWriteBufferFromFile.cpp @@ -44,10 +44,10 @@ FileSegmentRangeWriter::FileSegmentRangeWriter( const String & source_path_) : cache(cache_) , key(key_) + , log(&Poco::Logger::get("FileSegmentRangeWriter")) , cache_log(cache_log_) , query_id(query_id_) , source_path(source_path_) - , current_file_segment_it(file_segments_holder.file_segments.end()) { } @@ -56,69 +56,68 @@ bool FileSegmentRangeWriter::write(const char * data, size_t size, size_t offset if (finalized) return false; + if (expected_write_offset != offset) + { + throw Exception( + ErrorCodes::LOGICAL_ERROR, + "Cannot write file segment at offset {}, because expected write offset is: {}", + offset, expected_write_offset); + } + auto & file_segments = file_segments_holder.file_segments; - if (current_file_segment_it == file_segments.end()) + if (file_segments.empty() || file_segments.back()->isDownloaded()) { - current_file_segment_it = allocateFileSegment(current_file_segment_write_offset, is_persistent); - } - else - { - auto file_segment = *current_file_segment_it; - assert(file_segment->getCurrentWriteOffset() == current_file_segment_write_offset); - - if (current_file_segment_write_offset != offset) - { - throw Exception( - ErrorCodes::LOGICAL_ERROR, - "Cannot write file segment at offset {}, because current write offset is: {}", - offset, current_file_segment_write_offset); - } - - if (file_segment->range().size() == file_segment->getDownloadedSize()) - { - completeFileSegment(*file_segment); - current_file_segment_it = allocateFileSegment(current_file_segment_write_offset, is_persistent); - } + allocateFileSegment(expected_write_offset, is_persistent); } - auto & file_segment = *current_file_segment_it; - - auto downloader = file_segment->getOrSetDownloader(); - if (downloader != FileSegment::getCallerId()) - throw Exception(ErrorCodes::LOGICAL_ERROR, "Failed to set a downloader. ({})", file_segment->getInfoForLog()); + auto & file_segment = file_segments.back(); SCOPE_EXIT({ - if (file_segment->isDownloader()) - file_segment->completePartAndResetDownloader(); + if (file_segments.back()->isDownloader()) + file_segments.back()->completePartAndResetDownloader(); }); - bool reserved = file_segment->reserve(size); - if (!reserved) + while (size > 0) { - file_segment->completeWithState(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION); - appendFilesystemCacheLog(*file_segment); + size_t available_size = file_segment->range().size() - file_segment->getDownloadedSize(); + if (available_size == 0) + { + completeFileSegment(*file_segment); + file_segment = allocateFileSegment(expected_write_offset, is_persistent); + continue; + } - LOG_DEBUG( - &Poco::Logger::get("FileSegmentRangeWriter"), - "Unsuccessful space reservation attempt (size: {}, file segment info: {}", - size, file_segment->getInfoForLog()); + if (!file_segment->isDownloader() + && file_segment->getOrSetDownloader() != FileSegment::getCallerId()) + { + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Failed to set a downloader. ({})", file_segment->getInfoForLog()); + } - return false; - } + size_t size_to_write = std::min(available_size, size); - try - { - file_segment->write(data, size, offset); - } - catch (...) - { + bool reserved = file_segment->reserve(size_to_write); + if (!reserved) + { + file_segment->completeWithState(FileSegment::State::PARTIALLY_DOWNLOADED_NO_CONTINUATION); + appendFilesystemCacheLog(*file_segment); + + LOG_DEBUG( + log, "Failed to reserve space in cache (size: {}, file segment info: {}", + size, file_segment->getInfoForLog()); + + return false; + } + + file_segment->write(data, size_to_write, offset); file_segment->completePartAndResetDownloader(); - throw; - } - file_segment->completePartAndResetDownloader(); - current_file_segment_write_offset += size; + size -= size_to_write; + expected_write_offset += size_to_write; + offset += size_to_write; + data += size_to_write; + } return true; } @@ -129,10 +128,10 @@ void FileSegmentRangeWriter::finalize() return; auto & file_segments = file_segments_holder.file_segments; - if (file_segments.empty() || current_file_segment_it == file_segments.end()) + if (file_segments.empty()) return; - completeFileSegment(**current_file_segment_it); + completeFileSegment(*file_segments.back()); finalized = true; } @@ -149,7 +148,7 @@ FileSegmentRangeWriter::~FileSegmentRangeWriter() } } -FileSegments::iterator FileSegmentRangeWriter::allocateFileSegment(size_t offset, bool is_persistent) +FileSegmentPtr & FileSegmentRangeWriter::allocateFileSegment(size_t offset, bool is_persistent) { /** * Allocate a new file segment starting `offset`. @@ -168,7 +167,8 @@ FileSegments::iterator FileSegmentRangeWriter::allocateFileSegment(size_t offset auto file_segment = cache->createFileSegmentForDownload( key, offset, cache->max_file_segment_size, create_settings, cache_lock); - return file_segments_holder.add(std::move(file_segment)); + auto & file_segments = file_segments_holder.file_segments; + return *file_segments.insert(file_segments.end(), file_segment); } void FileSegmentRangeWriter::appendFilesystemCacheLog(const FileSegment & file_segment) @@ -199,7 +199,7 @@ void FileSegmentRangeWriter::appendFilesystemCacheLog(const FileSegment & file_s void FileSegmentRangeWriter::completeFileSegment(FileSegment & file_segment) { /// File segment can be detached if space reservation failed. - if (file_segment.isDetached()) + if (file_segment.isDetached() || file_segment.isCompleted()) return; file_segment.completeWithoutState(); @@ -223,6 +223,7 @@ CachedOnDiskWriteBufferFromFile::CachedOnDiskWriteBufferFromFile( , is_persistent_cache_file(is_persistent_cache_file_) , query_id(query_id_) , enable_cache_log(!query_id_.empty() && settings_.enable_filesystem_cache_log) + , throw_on_error_from_cache(settings_.throw_on_error_from_cache) { } @@ -246,11 +247,11 @@ void CachedOnDiskWriteBufferFromFile::nextImpl() } /// Write data to cache. - cacheData(working_buffer.begin(), size); + cacheData(working_buffer.begin(), size, throw_on_error_from_cache); current_download_offset += size; } -void CachedOnDiskWriteBufferFromFile::cacheData(char * data, size_t size) +void CachedOnDiskWriteBufferFromFile::cacheData(char * data, size_t size, bool throw_on_error) { if (cache_in_error_state_or_disabled) return; @@ -285,11 +286,17 @@ void CachedOnDiskWriteBufferFromFile::cacheData(char * data, size_t size) return; } + if (throw_on_error) + throw; + tryLogCurrentException(__PRETTY_FUNCTION__); return; } catch (...) { + if (throw_on_error) + throw; + tryLogCurrentException(__PRETTY_FUNCTION__); return; } diff --git a/src/Disks/IO/CachedOnDiskWriteBufferFromFile.h b/src/Disks/IO/CachedOnDiskWriteBufferFromFile.h index cec7305ab1b..280005734c0 100644 --- a/src/Disks/IO/CachedOnDiskWriteBufferFromFile.h +++ b/src/Disks/IO/CachedOnDiskWriteBufferFromFile.h @@ -39,7 +39,7 @@ public: ~FileSegmentRangeWriter(); private: - FileSegments::iterator allocateFileSegment(size_t offset, bool is_persistent); + FileSegmentPtr & allocateFileSegment(size_t offset, bool is_persistent); void appendFilesystemCacheLog(const FileSegment & file_segment); @@ -48,14 +48,14 @@ private: FileCache * cache; FileSegment::Key key; + Poco::Logger * log; std::shared_ptr cache_log; String query_id; String source_path; FileSegmentsHolder file_segments_holder{}; - FileSegments::iterator current_file_segment_it; - size_t current_file_segment_write_offset = 0; + size_t expected_write_offset = 0; bool finalized = false; }; @@ -81,7 +81,7 @@ public: void finalizeImpl() override; private: - void cacheData(char * data, size_t size); + void cacheData(char * data, size_t size, bool throw_on_error); Poco::Logger * log; @@ -95,6 +95,7 @@ private: bool enable_cache_log; + bool throw_on_error_from_cache; bool cache_in_error_state_or_disabled = false; std::unique_ptr cache_writer; diff --git a/src/Formats/BSONTypes.h b/src/Formats/BSONTypes.h index 2d20cdae698..14a3e9decca 100644 --- a/src/Formats/BSONTypes.h +++ b/src/Formats/BSONTypes.h @@ -7,6 +7,8 @@ namespace DB { static const uint8_t BSON_DOCUMENT_END = 0x00; +static const size_t BSON_OBJECT_ID_SIZE = 12; +static const size_t BSON_DB_POINTER_SIZE = 12; using BSONSizeT = uint32_t; static const BSONSizeT MAX_BSON_SIZE = std::numeric_limits::max(); diff --git a/src/Formats/CapnProtoUtils.cpp b/src/Formats/CapnProtoUtils.cpp index a7ff065aca5..fb5e7c06542 100644 --- a/src/Formats/CapnProtoUtils.cpp +++ b/src/Formats/CapnProtoUtils.cpp @@ -32,6 +32,16 @@ namespace ErrorCodes extern const int BAD_ARGUMENTS; } +std::pair splitCapnProtoFieldName(const String & name) +{ + const auto * begin = name.data(); + const auto * end = name.data() + name.size(); + const auto * it = find_first_symbols<'_', '.'>(begin, end); + String first = String(begin, it); + String second = it == end ? "" : String(it + 1, end); + return {first, second}; +} + capnp::StructSchema CapnProtoSchemaParser::getMessageSchema(const FormatSchemaInfo & schema_info) { capnp::ParsedSchema schema; @@ -201,9 +211,9 @@ static bool checkEnums(const capnp::Type & capnp_type, const DataTypePtr column_ return result; } -static bool checkCapnProtoType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message); +static bool checkCapnProtoType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message, const String & column_name); -static bool checkNullableType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message) +static bool checkNullableType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message, const String & column_name) { if (!capnp_type.isStruct()) return false; @@ -222,9 +232,9 @@ static bool checkNullableType(const capnp::Type & capnp_type, const DataTypePtr auto nested_type = assert_cast(data_type.get())->getNestedType(); if (first.getType().isVoid()) - return checkCapnProtoType(second.getType(), nested_type, mode, error_message); + return checkCapnProtoType(second.getType(), nested_type, mode, error_message, column_name); if (second.getType().isVoid()) - return checkCapnProtoType(first.getType(), nested_type, mode, error_message); + return checkCapnProtoType(first.getType(), nested_type, mode, error_message, column_name); return false; } @@ -260,7 +270,7 @@ static bool checkTupleType(const capnp::Type & capnp_type, const DataTypePtr & d { KJ_IF_MAYBE(field, struct_schema.findFieldByName(name)) { - if (!checkCapnProtoType(field->getType(), nested_types[tuple_data_type->getPositionByName(name)], mode, error_message)) + if (!checkCapnProtoType(field->getType(), nested_types[tuple_data_type->getPositionByName(name)], mode, error_message, name)) return false; } else @@ -273,16 +283,28 @@ static bool checkTupleType(const capnp::Type & capnp_type, const DataTypePtr & d return true; } -static bool checkArrayType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message) +static bool checkArrayType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message, const String & column_name) { if (!capnp_type.isList()) return false; auto list_schema = capnp_type.asList(); auto nested_type = assert_cast(data_type.get())->getNestedType(); - return checkCapnProtoType(list_schema.getElementType(), nested_type, mode, error_message); + + auto [field_name, nested_name] = splitCapnProtoFieldName(column_name); + if (!nested_name.empty() && list_schema.getElementType().isStruct()) + { + auto struct_schema = list_schema.getElementType().asStruct(); + KJ_IF_MAYBE(field, struct_schema.findFieldByName(nested_name)) + return checkCapnProtoType(field->getType(), nested_type, mode, error_message, nested_name); + + error_message += "Element type of List {} doesn't contain field with name " + nested_name; + return false; + } + + return checkCapnProtoType(list_schema.getElementType(), nested_type, mode, error_message, column_name); } -static bool checkCapnProtoType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message) +static bool checkCapnProtoType(const capnp::Type & capnp_type, const DataTypePtr & data_type, FormatSettings::EnumComparingMode mode, String & error_message, const String & column_name) { switch (data_type->getTypeId()) { @@ -301,9 +323,11 @@ static bool checkCapnProtoType(const capnp::Type & capnp_type, const DataTypePtr case TypeIndex::Int16: return capnp_type.isInt16(); case TypeIndex::Date32: [[fallthrough]]; + case TypeIndex::Decimal32: [[fallthrough]]; case TypeIndex::Int32: return capnp_type.isInt32(); case TypeIndex::DateTime64: [[fallthrough]]; + case TypeIndex::Decimal64: [[fallthrough]]; case TypeIndex::Int64: return capnp_type.isInt64(); case TypeIndex::Float32: @@ -318,15 +342,15 @@ static bool checkCapnProtoType(const capnp::Type & capnp_type, const DataTypePtr return checkTupleType(capnp_type, data_type, mode, error_message); case TypeIndex::Nullable: { - auto result = checkNullableType(capnp_type, data_type, mode, error_message); + auto result = checkNullableType(capnp_type, data_type, mode, error_message, column_name); if (!result) error_message += "Nullable can be represented only as a named union of type Void and nested type"; return result; } case TypeIndex::Array: - return checkArrayType(capnp_type, data_type, mode, error_message); + return checkArrayType(capnp_type, data_type, mode, error_message, column_name); case TypeIndex::LowCardinality: - return checkCapnProtoType(capnp_type, assert_cast(data_type.get())->getDictionaryType(), mode, error_message); + return checkCapnProtoType(capnp_type, assert_cast(data_type.get())->getDictionaryType(), mode, error_message, column_name); case TypeIndex::FixedString: [[fallthrough]]; case TypeIndex::String: return capnp_type.isText() || capnp_type.isData(); @@ -335,19 +359,9 @@ static bool checkCapnProtoType(const capnp::Type & capnp_type, const DataTypePtr } } -static std::pair splitFieldName(const String & name) -{ - const auto * begin = name.data(); - const auto * end = name.data() + name.size(); - const auto * it = find_first_symbols<'_', '.'>(begin, end); - String first = String(begin, it); - String second = it == end ? "" : String(it + 1, end); - return {first, second}; -} - capnp::DynamicValue::Reader getReaderByColumnName(const capnp::DynamicStruct::Reader & struct_reader, const String & name) { - auto [field_name, nested_name] = splitFieldName(name); + auto [field_name, nested_name] = splitCapnProtoFieldName(name); KJ_IF_MAYBE(field, struct_reader.getSchema().findFieldByName(field_name)) { capnp::DynamicValue::Reader field_reader; @@ -363,6 +377,20 @@ capnp::DynamicValue::Reader getReaderByColumnName(const capnp::DynamicStruct::Re if (nested_name.empty()) return field_reader; + /// Support reading Nested as List of Structs. + if (field_reader.getType() == capnp::DynamicValue::LIST) + { + auto list_schema = field->getType().asList(); + if (!list_schema.getElementType().isStruct()) + throw Exception(ErrorCodes::CAPN_PROTO_BAD_CAST, "Element type of List {} is not a struct", field_name); + + auto struct_schema = list_schema.getElementType().asStruct(); + KJ_IF_MAYBE(nested_field, struct_schema.findFieldByName(nested_name)) + return field_reader; + + throw Exception(ErrorCodes::THERE_IS_NO_COLUMN, "Element type of List {} doesn't contain field with name \"{}\"", field_name, nested_name); + } + if (field_reader.getType() != capnp::DynamicValue::STRUCT) throw Exception(ErrorCodes::CAPN_PROTO_BAD_CAST, "Field {} is not a struct", field_name); @@ -374,13 +402,28 @@ capnp::DynamicValue::Reader getReaderByColumnName(const capnp::DynamicStruct::Re std::pair getStructBuilderAndFieldByColumnName(capnp::DynamicStruct::Builder struct_builder, const String & name) { - auto [field_name, nested_name] = splitFieldName(name); + auto [field_name, nested_name] = splitCapnProtoFieldName(name); KJ_IF_MAYBE(field, struct_builder.getSchema().findFieldByName(field_name)) { if (nested_name.empty()) return {struct_builder, *field}; auto field_builder = struct_builder.get(*field); + + /// Support reading Nested as List of Structs. + if (field_builder.getType() == capnp::DynamicValue::LIST) + { + auto list_schema = field->getType().asList(); + if (!list_schema.getElementType().isStruct()) + throw Exception(ErrorCodes::CAPN_PROTO_BAD_CAST, "Element type of List {} is not a struct", field_name); + + auto struct_schema = list_schema.getElementType().asStruct(); + KJ_IF_MAYBE(nested_field, struct_schema.findFieldByName(nested_name)) + return {struct_builder, *field}; + + throw Exception(ErrorCodes::THERE_IS_NO_COLUMN, "Element type of List {} doesn't contain field with name \"{}\"", field_name, nested_name); + } + if (field_builder.getType() != capnp::DynamicValue::STRUCT) throw Exception(ErrorCodes::CAPN_PROTO_BAD_CAST, "Field {} is not a struct", field_name); @@ -390,13 +433,27 @@ std::pair getStructBu throw Exception(ErrorCodes::THERE_IS_NO_COLUMN, "Capnproto struct doesn't contain field with name {}", field_name); } -static capnp::StructSchema::Field getFieldByName(const capnp::StructSchema & schema, const String & name) +static std::pair getFieldByName(const capnp::StructSchema & schema, const String & name) { - auto [field_name, nested_name] = splitFieldName(name); + auto [field_name, nested_name] = splitCapnProtoFieldName(name); KJ_IF_MAYBE(field, schema.findFieldByName(field_name)) { if (nested_name.empty()) - return *field; + return {*field, name}; + + /// Support reading Nested as List of Structs. + if (field->getType().isList()) + { + auto list_schema = field->getType().asList(); + if (!list_schema.getElementType().isStruct()) + throw Exception(ErrorCodes::CAPN_PROTO_BAD_CAST, "Element type of List {} is not a struct", field_name); + + auto struct_schema = list_schema.getElementType().asStruct(); + KJ_IF_MAYBE(nested_field, struct_schema.findFieldByName(nested_name)) + return {*field, name}; + + throw Exception(ErrorCodes::THERE_IS_NO_COLUMN, "Element type of List {} doesn't contain field with name \"{}\"", field_name, nested_name); + } if (!field->getType().isStruct()) throw Exception(ErrorCodes::CAPN_PROTO_BAD_CAST, "Field {} is not a struct", field_name); @@ -416,8 +473,8 @@ void checkCapnProtoSchemaStructure(const capnp::StructSchema & schema, const Blo String additional_error_message; for (auto & [name, type] : names_and_types) { - auto field = getFieldByName(schema, name); - if (!checkCapnProtoType(field.getType(), type, mode, additional_error_message)) + auto [field, field_name] = getFieldByName(schema, name); + if (!checkCapnProtoType(field.getType(), type, mode, additional_error_message, field_name)) { auto e = Exception( ErrorCodes::CAPN_PROTO_BAD_CAST, diff --git a/src/Formats/CapnProtoUtils.h b/src/Formats/CapnProtoUtils.h index 102c3a2e306..2d8cdb418d7 100644 --- a/src/Formats/CapnProtoUtils.h +++ b/src/Formats/CapnProtoUtils.h @@ -30,6 +30,8 @@ public: capnp::StructSchema getMessageSchema(const FormatSchemaInfo & schema_info); }; +std::pair splitCapnProtoFieldName(const String & name); + bool compareEnumNames(const String & first, const String & second, FormatSettings::EnumComparingMode mode); std::pair getStructBuilderAndFieldByColumnName(capnp::DynamicStruct::Builder struct_builder, const String & name); diff --git a/src/Formats/ProtobufSerializer.cpp b/src/Formats/ProtobufSerializer.cpp index 2f56c4242e5..48332deedfb 100644 --- a/src/Formats/ProtobufSerializer.cpp +++ b/src/Formats/ProtobufSerializer.cpp @@ -1736,7 +1736,7 @@ namespace } const std::shared_ptr aggregate_function_data_type; - const AggregateFunctionPtr aggregate_function; + AggregateFunctionPtr aggregate_function; String text_buffer; }; diff --git a/src/Functions/FunctionDateOrDateTimeAddInterval.h b/src/Functions/FunctionDateOrDateTimeAddInterval.h index 2259cc71f07..3002e330f0c 100644 --- a/src/Functions/FunctionDateOrDateTimeAddInterval.h +++ b/src/Functions/FunctionDateOrDateTimeAddInterval.h @@ -685,37 +685,27 @@ public: } else if constexpr (std::is_same_v) { - if (typeid_cast(arguments[0].type.get())) + static constexpr auto target_scale = std::invoke( + []() -> std::optional + { + if constexpr (std::is_base_of_v) + return 9; + else if constexpr (std::is_base_of_v) + return 6; + else if constexpr (std::is_base_of_v) + return 3; + + return {}; + }); + + auto timezone = extractTimeZoneNameFromFunctionArguments(arguments, 2, 0); + if (const auto* datetime64_type = typeid_cast(arguments[0].type.get())) { - const auto & datetime64_type = assert_cast(*arguments[0].type); - - auto from_scale = datetime64_type.getScale(); - auto scale = from_scale; - - if (std::is_same_v) - scale = 9; - else if (std::is_same_v) - scale = 6; - else if (std::is_same_v) - scale = 3; - - scale = std::max(scale, from_scale); - - return std::make_shared(scale, extractTimeZoneNameFromFunctionArguments(arguments, 2, 0)); + const auto from_scale = datetime64_type->getScale(); + return std::make_shared(std::max(from_scale, target_scale.value_or(from_scale)), std::move(timezone)); } - else - { - auto scale = DataTypeDateTime64::default_scale; - if (std::is_same_v) - scale = 9; - else if (std::is_same_v) - scale = 6; - else if (std::is_same_v) - scale = 3; - - return std::make_shared(scale, extractTimeZoneNameFromFunctionArguments(arguments, 2, 0)); - } + return std::make_shared(target_scale.value_or(DataTypeDateTime64::default_scale), std::move(timezone)); } throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected result type in datetime add interval function"); diff --git a/src/Functions/FunctionsBinaryRepresentation.cpp b/src/Functions/FunctionsBinaryRepresentation.cpp index f71f05bbf34..b0bdbc2130c 100644 --- a/src/Functions/FunctionsBinaryRepresentation.cpp +++ b/src/Functions/FunctionsBinaryRepresentation.cpp @@ -4,7 +4,7 @@ #include #include #include -#include +#include #include #include #include @@ -126,20 +126,7 @@ struct UnhexImpl static void decode(const char * pos, const char * end, char *& out) { - if ((end - pos) & 1) - { - *out = unhex(*pos); - ++out; - ++pos; - } - while (pos < end) - { - *out = unhex2(pos); - pos += word_size; - ++out; - } - *out = '\0'; - ++out; + hexStringDecode(pos, end, out, word_size); } }; @@ -233,52 +220,7 @@ struct UnbinImpl static void decode(const char * pos, const char * end, char *& out) { - if (pos == end) - { - *out = '\0'; - ++out; - return; - } - - UInt8 left = 0; - - /// end - pos is the length of input. - /// (length & 7) to make remain bits length mod 8 is zero to split. - /// e.g. the length is 9 and the input is "101000001", - /// first left_cnt is 1, left is 0, right shift, pos is 1, left = 1 - /// then, left_cnt is 0, remain input is '01000001'. - for (UInt8 left_cnt = (end - pos) & 7; left_cnt > 0; --left_cnt) - { - left = left << 1; - if (*pos != '0') - left += 1; - ++pos; - } - - if (left != 0 || end - pos == 0) - { - *out = left; - ++out; - } - - assert((end - pos) % 8 == 0); - - while (end - pos != 0) - { - UInt8 c = 0; - for (UInt8 i = 0; i < 8; ++i) - { - c = c << 1; - if (*pos != '0') - c += 1; - ++pos; - } - *out = c; - ++out; - } - - *out = '\0'; - ++out; + binStringDecode(pos, end, out); } }; diff --git a/src/Functions/IFunction.h b/src/Functions/IFunction.h index fc1a353a873..e82b98f0084 100644 --- a/src/Functions/IFunction.h +++ b/src/Functions/IFunction.h @@ -3,6 +3,8 @@ #include #include #include +#include +#include #include #include "config.h" @@ -122,11 +124,11 @@ using Values = std::vector; /** Function with known arguments and return type (when the specific overload was chosen). * It is also the point where all function-specific properties are known. */ -class IFunctionBase +class IFunctionBase : public IResolvedFunction { public: - virtual ~IFunctionBase() = default; + ~IFunctionBase() override = default; virtual ColumnPtr execute( /// NOLINT const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count, bool dry_run = false) const @@ -137,8 +139,10 @@ public: /// Get the main function name. virtual String getName() const = 0; - virtual const DataTypes & getArgumentTypes() const = 0; - virtual const DataTypePtr & getResultType() const = 0; + const Array & getParameters() const final + { + throw Exception(ErrorCodes::NOT_IMPLEMENTED, "IFunctionBase doesn't support getParameters method"); + } /// Do preparations and return executable. /// sample_columns should contain data types of arguments and values of constants, if relevant. @@ -281,7 +285,7 @@ public: }; -using FunctionBasePtr = std::shared_ptr; +using FunctionBasePtr = std::shared_ptr; /** Creates IFunctionBase from argument types list (chooses one function overload). diff --git a/src/Functions/IFunctionAdaptors.h b/src/Functions/IFunctionAdaptors.h index dbcc07af57a..eb2350d9b5e 100644 --- a/src/Functions/IFunctionAdaptors.h +++ b/src/Functions/IFunctionAdaptors.h @@ -51,6 +51,8 @@ public: const DataTypes & getArgumentTypes() const override { return arguments; } const DataTypePtr & getResultType() const override { return result_type; } + const FunctionPtr & getFunction() const { return function; } + #if USE_EMBEDDED_COMPILER bool isCompilable() const override { return function->isCompilable(getArgumentTypes()); } diff --git a/src/Functions/array/arrayJoin.cpp b/src/Functions/array/arrayJoin.cpp index 1dbe4cebb14..41f19fae6bf 100644 --- a/src/Functions/array/arrayJoin.cpp +++ b/src/Functions/array/arrayJoin.cpp @@ -2,6 +2,7 @@ #include #include #include +#include namespace DB @@ -52,11 +53,11 @@ public: DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override { - const DataTypeArray * arr = checkAndGetDataType(arguments[0].get()); + const auto & arr = getArrayJoinDataType(arguments[0]); if (!arr) - throw Exception("Argument for function " + getName() + " must be Array.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); - + throw Exception("Argument for function " + getName() + " must be Array or Map", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return arr->getNestedType(); + } ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t /*input_rows_count*/) const override diff --git a/src/Functions/array/arrayReduce.cpp b/src/Functions/array/arrayReduce.cpp index c93e67d4b1c..e7ed8577049 100644 --- a/src/Functions/array/arrayReduce.cpp +++ b/src/Functions/array/arrayReduce.cpp @@ -104,7 +104,7 @@ DataTypePtr FunctionArrayReduce::getReturnTypeImpl(const ColumnsWithTypeAndName aggregate_function = AggregateFunctionFactory::instance().get(aggregate_function_name, argument_types, params_row, properties); } - return aggregate_function->getReturnType(); + return aggregate_function->getResultType(); } diff --git a/src/Functions/array/arrayReduceInRanges.cpp b/src/Functions/array/arrayReduceInRanges.cpp index 11d5e03eb3d..2cceea4ddba 100644 --- a/src/Functions/array/arrayReduceInRanges.cpp +++ b/src/Functions/array/arrayReduceInRanges.cpp @@ -122,7 +122,7 @@ DataTypePtr FunctionArrayReduceInRanges::getReturnTypeImpl(const ColumnsWithType aggregate_function = AggregateFunctionFactory::instance().get(aggregate_function_name, argument_types, params_row, properties); } - return std::make_shared(aggregate_function->getReturnType()); + return std::make_shared(aggregate_function->getResultType()); } diff --git a/src/Functions/formatDateTime.cpp b/src/Functions/formatDateTime.cpp index 4db04d61d84..4c24239a06c 100644 --- a/src/Functions/formatDateTime.cpp +++ b/src/Functions/formatDateTime.cpp @@ -48,7 +48,6 @@ template <> struct ActionValueTypeMap { using ActionValueTyp template <> struct ActionValueTypeMap { using ActionValueType = UInt16; }; template <> struct ActionValueTypeMap { using ActionValueType = Int32; }; template <> struct ActionValueTypeMap { using ActionValueType = UInt32; }; -// TODO(vnemkov): to add sub-second format instruction, make that DateTime64 and do some math in Action. template <> struct ActionValueTypeMap { using ActionValueType = Int64; }; @@ -113,16 +112,16 @@ private: class Action { public: - using Func = void (*)(char *, Time, const DateLUTImpl &); + using Func = void (*)(char *, Time, UInt64, UInt32, const DateLUTImpl &); Func func; size_t shift; explicit Action(Func func_, size_t shift_ = 0) : func(func_), shift(shift_) {} - void perform(char *& target, Time source, const DateLUTImpl & timezone) + void perform(char *& target, Time source, UInt64 fractional_second, UInt32 scale, const DateLUTImpl & timezone) { - func(target, source, timezone); + func(target, source, fractional_second, scale, timezone); target += shift; } @@ -148,30 +147,30 @@ private: } public: - static void noop(char *, Time, const DateLUTImpl &) + static void noop(char *, Time, UInt64 , UInt32 , const DateLUTImpl &) { } - static void century(char * target, Time source, const DateLUTImpl & timezone) + static void century(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { auto year = ToYearImpl::execute(source, timezone); auto century = year / 100; writeNumber2(target, century); } - static void dayOfMonth(char * target, Time source, const DateLUTImpl & timezone) + static void dayOfMonth(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToDayOfMonthImpl::execute(source, timezone)); } - static void americanDate(char * target, Time source, const DateLUTImpl & timezone) + static void americanDate(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToMonthImpl::execute(source, timezone)); writeNumber2(target + 3, ToDayOfMonthImpl::execute(source, timezone)); writeNumber2(target + 6, ToYearImpl::execute(source, timezone) % 100); } - static void dayOfMonthSpacePadded(char * target, Time source, const DateLUTImpl & timezone) + static void dayOfMonthSpacePadded(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { auto day = ToDayOfMonthImpl::execute(source, timezone); if (day < 10) @@ -180,101 +179,107 @@ private: writeNumber2(target, day); } - static void ISO8601Date(char * target, Time source, const DateLUTImpl & timezone) // NOLINT + static void ISO8601Date(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) // NOLINT { writeNumber4(target, ToYearImpl::execute(source, timezone)); writeNumber2(target + 5, ToMonthImpl::execute(source, timezone)); writeNumber2(target + 8, ToDayOfMonthImpl::execute(source, timezone)); } - static void dayOfYear(char * target, Time source, const DateLUTImpl & timezone) + static void dayOfYear(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber3(target, ToDayOfYearImpl::execute(source, timezone)); } - static void month(char * target, Time source, const DateLUTImpl & timezone) + static void month(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToMonthImpl::execute(source, timezone)); } - static void dayOfWeek(char * target, Time source, const DateLUTImpl & timezone) + static void dayOfWeek(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { *target += ToDayOfWeekImpl::execute(source, timezone); } - static void dayOfWeek0To6(char * target, Time source, const DateLUTImpl & timezone) + static void dayOfWeek0To6(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { auto day = ToDayOfWeekImpl::execute(source, timezone); *target += (day == 7 ? 0 : day); } - static void ISO8601Week(char * target, Time source, const DateLUTImpl & timezone) // NOLINT + static void ISO8601Week(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) // NOLINT { writeNumber2(target, ToISOWeekImpl::execute(source, timezone)); } - static void ISO8601Year2(char * target, Time source, const DateLUTImpl & timezone) // NOLINT + static void ISO8601Year2(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) // NOLINT { writeNumber2(target, ToISOYearImpl::execute(source, timezone) % 100); } - static void ISO8601Year4(char * target, Time source, const DateLUTImpl & timezone) // NOLINT + static void ISO8601Year4(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) // NOLINT { writeNumber4(target, ToISOYearImpl::execute(source, timezone)); } - static void year2(char * target, Time source, const DateLUTImpl & timezone) + static void year2(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToYearImpl::execute(source, timezone) % 100); } - static void year4(char * target, Time source, const DateLUTImpl & timezone) + static void year4(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber4(target, ToYearImpl::execute(source, timezone)); } - static void hour24(char * target, Time source, const DateLUTImpl & timezone) + static void hour24(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToHourImpl::execute(source, timezone)); } - static void hour12(char * target, Time source, const DateLUTImpl & timezone) + static void hour12(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { auto x = ToHourImpl::execute(source, timezone); writeNumber2(target, x == 0 ? 12 : (x > 12 ? x - 12 : x)); } - static void minute(char * target, Time source, const DateLUTImpl & timezone) + static void minute(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToMinuteImpl::execute(source, timezone)); } - static void AMPM(char * target, Time source, const DateLUTImpl & timezone) // NOLINT + static void AMPM(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) // NOLINT { auto hour = ToHourImpl::execute(source, timezone); if (hour >= 12) *target = 'P'; } - static void hhmm24(char * target, Time source, const DateLUTImpl & timezone) + static void hhmm24(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToHourImpl::execute(source, timezone)); writeNumber2(target + 3, ToMinuteImpl::execute(source, timezone)); } - static void second(char * target, Time source, const DateLUTImpl & timezone) + static void second(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { writeNumber2(target, ToSecondImpl::execute(source, timezone)); } - static void ISO8601Time(char * target, Time source, const DateLUTImpl & timezone) // NOLINT + static void fractionalSecond(char * target, Time /*source*/, UInt64 fractional_second, UInt32 scale, const DateLUTImpl & /*timezone*/) + { + for (Int64 i = scale, value = fractional_second; i > 0; --i, value /= 10) + target[i - 1] += value % 10; + } + + static void ISO8601Time(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) // NOLINT { writeNumber2(target, ToHourImpl::execute(source, timezone)); writeNumber2(target + 3, ToMinuteImpl::execute(source, timezone)); writeNumber2(target + 6, ToSecondImpl::execute(source, timezone)); } - static void timezoneOffset(char * target, Time source, const DateLUTImpl & timezone) + static void timezoneOffset(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { auto offset = TimezoneOffsetImpl::execute(source, timezone); if (offset < 0) @@ -287,7 +292,7 @@ private: writeNumber2(target + 3, offset % 3600 / 60); } - static void quarter(char * target, Time source, const DateLUTImpl & timezone) + static void quarter(char * target, Time source, UInt64 /*fractional_second*/, UInt32 /*scale*/, const DateLUTImpl & timezone) { *target += ToQuarterImpl::execute(source, timezone); } @@ -426,9 +431,15 @@ public: String pattern = pattern_column->getValue(); + UInt32 scale [[maybe_unused]] = 0; + if constexpr (std::is_same_v) + { + scale = times->getScale(); + } + using T = typename ActionValueTypeMap::ActionValueType; std::vector> instructions; - String pattern_to_fill = parsePattern(pattern, instructions); + String pattern_to_fill = parsePattern(pattern, instructions, scale); size_t result_size = pattern_to_fill.size(); const DateLUTImpl * time_zone_tmp = nullptr; @@ -444,12 +455,6 @@ public: const DateLUTImpl & time_zone = *time_zone_tmp; const auto & vec = times->getData(); - UInt32 scale [[maybe_unused]] = 0; - if constexpr (std::is_same_v) - { - scale = times->getScale(); - } - auto col_res = ColumnString::create(); auto & dst_data = col_res->getChars(); auto & dst_offsets = col_res->getOffsets(); @@ -484,16 +489,16 @@ public: { if constexpr (std::is_same_v) { + const auto c = DecimalUtils::split(vec[i], scale); for (auto & instruction : instructions) { - const auto c = DecimalUtils::split(vec[i], scale); - instruction.perform(pos, static_cast(c.whole), time_zone); + instruction.perform(pos, static_cast(c.whole), c.fractional, scale, time_zone); } } else { for (auto & instruction : instructions) - instruction.perform(pos, static_cast(vec[i]), time_zone); + instruction.perform(pos, static_cast(vec[i]), 0, 0, time_zone); } dst_offsets[i] = pos - begin; @@ -504,7 +509,7 @@ public: } template - String parsePattern(const String & pattern, std::vector> & instructions) const + String parsePattern(const String & pattern, std::vector> & instructions, UInt32 scale) const { String result; @@ -573,6 +578,16 @@ public: result.append(" 0"); break; + // Fractional seconds + case 'f': + { + /// If the time data type has no fractional part, then we print '0' as the fractional part. + const auto actual_scale = std::max(1, scale); + instructions.emplace_back(&Action::fractionalSecond, actual_scale); + result.append(actual_scale, '0'); + break; + } + // Short YYYY-MM-DD date, equivalent to %Y-%m-%d 2001-08-23 case 'F': instructions.emplace_back(&Action::ISO8601Date, 10); diff --git a/src/Functions/in.cpp b/src/Functions/in.cpp index 5773e823a80..1de8371cf90 100644 --- a/src/Functions/in.cpp +++ b/src/Functions/in.cpp @@ -17,6 +17,7 @@ namespace DB namespace ErrorCodes { extern const int ILLEGAL_COLUMN; + extern const int LOGICAL_ERROR; } namespace @@ -94,6 +95,8 @@ public: { if constexpr (ignore_set) return ColumnUInt8::create(input_rows_count, 0u); + if (input_rows_count == 0) + return ColumnUInt8::create(); /// Second argument must be ColumnSet. ColumnPtr column_set_ptr = arguments[1].column; @@ -135,12 +138,16 @@ public: /// Replace single LowCardinality column to it's dictionary if possible. ColumnPtr lc_indexes = nullptr; + bool is_const = false; if (columns_of_key_columns.size() == 1) { auto & arg = columns_of_key_columns.at(0); const auto * col = arg.column.get(); if (const auto * const_col = typeid_cast(col)) + { col = &const_col->getDataColumn(); + is_const = true; + } if (const auto * lc = typeid_cast(col)) { @@ -153,7 +160,13 @@ public: auto res = set->execute(columns_of_key_columns, negative); if (lc_indexes) - return res->index(*lc_indexes, 0); + res = res->index(*lc_indexes, 0); + + if (is_const) + res = ColumnUInt8::create(input_rows_count, res->getUInt(0)); + + if (res->size() != input_rows_count) + throw Exception(ErrorCodes::LOGICAL_ERROR, "Output size is different from input size, expect {}, get {}", input_rows_count, res->size()); return res; } diff --git a/src/Functions/initializeAggregation.cpp b/src/Functions/initializeAggregation.cpp index 08352553b9c..b782cd04f75 100644 --- a/src/Functions/initializeAggregation.cpp +++ b/src/Functions/initializeAggregation.cpp @@ -87,7 +87,7 @@ DataTypePtr FunctionInitializeAggregation::getReturnTypeImpl(const ColumnsWithTy aggregate_function = AggregateFunctionFactory::instance().get(aggregate_function_name, argument_types, params_row, properties); } - return aggregate_function->getReturnType(); + return aggregate_function->getResultType(); } diff --git a/src/Functions/runningAccumulate.cpp b/src/Functions/runningAccumulate.cpp index 336c45e49cb..436637fbe56 100644 --- a/src/Functions/runningAccumulate.cpp +++ b/src/Functions/runningAccumulate.cpp @@ -102,7 +102,7 @@ public: /// Will pass empty arena if agg_func does not allocate memory in arena std::unique_ptr arena = agg_func.allocatesMemoryInArena() ? std::make_unique() : nullptr; - auto result_column_ptr = agg_func.getReturnType()->createColumn(); + auto result_column_ptr = agg_func.getResultType()->createColumn(); IColumn & result_column = *result_column_ptr; result_column.reserve(column_with_states->size()); diff --git a/src/IO/WriteSettings.h b/src/IO/WriteSettings.h index a1f5b23fb97..764d6c8992b 100644 --- a/src/IO/WriteSettings.h +++ b/src/IO/WriteSettings.h @@ -15,6 +15,8 @@ struct WriteSettings bool enable_filesystem_cache_on_write_operations = false; bool enable_filesystem_cache_log = false; bool is_file_cache_persistent = false; + bool throw_on_error_from_cache = false; + bool s3_allow_parallel_part_upload = true; /// Monitoring diff --git a/src/Interpreters/ActionsDAG.cpp b/src/Interpreters/ActionsDAG.cpp index 02704f7fc78..3b4d2dd1dd4 100644 --- a/src/Interpreters/ActionsDAG.cpp +++ b/src/Interpreters/ActionsDAG.cpp @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -47,8 +48,6 @@ void ActionsDAG::Node::toTree(JSONBuilder::JSONMap & map) const if (function_base) map.add("Function", function_base->getName()); - else if (function_builder) - map.add("Function", function_builder->getName()); if (type == ActionType::FUNCTION) map.add("Compiled", is_function_compiled); @@ -141,7 +140,7 @@ const ActionsDAG::Node & ActionsDAG::addAlias(const Node & child, std::string al const ActionsDAG::Node & ActionsDAG::addArrayJoin(const Node & child, std::string result_name) { - const DataTypeArray * array_type = typeid_cast(child.result_type.get()); + const auto & array_type = getArrayJoinDataType(child.result_type); if (!array_type) throw Exception("ARRAY JOIN requires array argument", ErrorCodes::TYPE_MISMATCH); @@ -166,7 +165,6 @@ const ActionsDAG::Node & ActionsDAG::addFunction( Node node; node.type = ActionType::FUNCTION; - node.function_builder = function; node.children = std::move(children); bool all_const = true; @@ -238,6 +236,86 @@ const ActionsDAG::Node & ActionsDAG::addFunction( return addNode(std::move(node)); } +const ActionsDAG::Node & ActionsDAG::addFunction( + const FunctionBasePtr & function_base, + NodeRawConstPtrs children, + std::string result_name) +{ + size_t num_arguments = children.size(); + + Node node; + node.type = ActionType::FUNCTION; + node.children = std::move(children); + + bool all_const = true; + ColumnsWithTypeAndName arguments(num_arguments); + + for (size_t i = 0; i < num_arguments; ++i) + { + const auto & child = *node.children[i]; + + ColumnWithTypeAndName argument; + argument.column = child.column; + argument.type = child.result_type; + argument.name = child.result_name; + + if (!argument.column || !isColumnConst(*argument.column)) + all_const = false; + + arguments[i] = std::move(argument); + } + + node.function_base = function_base; + node.result_type = node.function_base->getResultType(); + node.function = node.function_base->prepare(arguments); + node.is_deterministic = node.function_base->isDeterministic(); + + /// If all arguments are constants, and function is suitable to be executed in 'prepare' stage - execute function. + if (node.function_base->isSuitableForConstantFolding()) + { + ColumnPtr column; + + if (all_const) + { + size_t num_rows = arguments.empty() ? 0 : arguments.front().column->size(); + column = node.function->execute(arguments, node.result_type, num_rows, true); + } + else + { + column = node.function_base->getConstantResultForNonConstArguments(arguments, node.result_type); + } + + /// If the result is not a constant, just in case, we will consider the result as unknown. + if (column && isColumnConst(*column)) + { + /// All constant (literal) columns in block are added with size 1. + /// But if there was no columns in block before executing a function, the result has size 0. + /// Change the size to 1. + + if (column->empty()) + column = column->cloneResized(1); + + node.column = std::move(column); + } + } + + if (result_name.empty()) + { + result_name = function_base->getName() + "("; + for (size_t i = 0; i < num_arguments; ++i) + { + if (i) + result_name += ", "; + result_name += node.children[i]->result_name; + } + result_name += ")"; + } + + node.result_name = std::move(result_name); + + return addNode(std::move(node)); +} + const ActionsDAG::Node & ActionsDAG::findInOutputs(const std::string & name) const { if (const auto * node = tryFindInOutputs(name)) @@ -463,11 +541,10 @@ static ColumnWithTypeAndName executeActionForHeader(const ActionsDAG::Node * nod auto key = arguments.at(0); key.column = key.column->convertToFullColumnIfConst(); - const ColumnArray * array = typeid_cast(key.column.get()); + const auto * array = getArrayJoinColumnRawPtr(key.column); if (!array) throw Exception(ErrorCodes::TYPE_MISMATCH, - "ARRAY JOIN of not array: {}", node->result_name); - + "ARRAY JOIN of not array nor map: {}", node->result_name); res_column.column = array->getDataPtr()->cloneEmpty(); break; } @@ -1954,8 +2031,7 @@ ActionsDAGPtr ActionsDAG::cloneActionsForFilterPushDown( FunctionOverloadResolverPtr func_builder_cast = CastInternalOverloadResolver::createImpl(); - predicate->function_builder = func_builder_cast; - predicate->function_base = predicate->function_builder->build(arguments); + predicate->function_base = func_builder_cast->build(arguments); predicate->function = predicate->function_base->prepare(arguments); } } @@ -1966,7 +2042,9 @@ ActionsDAGPtr ActionsDAG::cloneActionsForFilterPushDown( predicate->children.swap(new_children); auto arguments = prepareFunctionArguments(predicate->children); - predicate->function_base = predicate->function_builder->build(arguments); + FunctionOverloadResolverPtr func_builder_and = std::make_unique(std::make_shared()); + + predicate->function_base = func_builder_and->build(arguments); predicate->function = predicate->function_base->prepare(arguments); } } @@ -2171,7 +2249,7 @@ ActionsDAGPtr ActionsDAG::buildFilterActionsDAG( for (const auto & child : node->children) function_children.push_back(node_to_result_node.find(child)->second); - result_node = &result_dag->addFunction(node->function_builder, std::move(function_children), {}); + result_node = &result_dag->addFunction(node->function_base, std::move(function_children), {}); break; } } diff --git a/src/Interpreters/ActionsDAG.h b/src/Interpreters/ActionsDAG.h index a532dd0c436..f574757abac 100644 --- a/src/Interpreters/ActionsDAG.h +++ b/src/Interpreters/ActionsDAG.h @@ -17,7 +17,7 @@ class IExecutableFunction; using ExecutableFunctionPtr = std::shared_ptr; class IFunctionBase; -using FunctionBasePtr = std::shared_ptr; +using FunctionBasePtr = std::shared_ptr; class IFunctionOverloadResolver; using FunctionOverloadResolverPtr = std::shared_ptr; @@ -74,7 +74,6 @@ public: std::string result_name; DataTypePtr result_type; - FunctionOverloadResolverPtr function_builder; /// Can be used to get function signature or properties like monotonicity. FunctionBasePtr function_base; /// Prepared function which is used in function execution. @@ -139,6 +138,10 @@ public: const FunctionOverloadResolverPtr & function, NodeRawConstPtrs children, std::string result_name); + const Node & addFunction( + const FunctionBasePtr & function_base, + NodeRawConstPtrs children, + std::string result_name); /// Find first column by name in output nodes. This search is linear. const Node & findInOutputs(const std::string & name) const; diff --git a/src/Interpreters/AggregateDescription.cpp b/src/Interpreters/AggregateDescription.cpp index b0f51ea7c90..787e0a503f8 100644 --- a/src/Interpreters/AggregateDescription.cpp +++ b/src/Interpreters/AggregateDescription.cpp @@ -53,7 +53,7 @@ void AggregateDescription::explain(WriteBuffer & out, size_t indent) const out << type->getName(); } - out << ") → " << function->getReturnType()->getName() << "\n"; + out << ") → " << function->getResultType()->getName() << "\n"; } else out << prefix << " Function: nullptr\n"; @@ -109,7 +109,7 @@ void AggregateDescription::explain(JSONBuilder::JSONMap & map) const args_array->add(type->getName()); function_map->add("Argument Types", std::move(args_array)); - function_map->add("Result Type", function->getReturnType()->getName()); + function_map->add("Result Type", function->getResultType()->getName()); map.add("Function", std::move(function_map)); } diff --git a/src/Interpreters/AggregationUtils.cpp b/src/Interpreters/AggregationUtils.cpp index 4e870e8152b..157590e6f44 100644 --- a/src/Interpreters/AggregationUtils.cpp +++ b/src/Interpreters/AggregationUtils.cpp @@ -45,7 +45,7 @@ OutputBlockColumns prepareOutputBlockColumns( } else { - final_aggregate_columns[i] = aggregate_functions[i]->getReturnType()->createColumn(); + final_aggregate_columns[i] = aggregate_functions[i]->getResultType()->createColumn(); final_aggregate_columns[i]->reserve(rows); if (aggregate_functions[i]->isState()) diff --git a/src/Interpreters/Aggregator.cpp b/src/Interpreters/Aggregator.cpp index 14113514f1e..f3caf43b1fd 100644 --- a/src/Interpreters/Aggregator.cpp +++ b/src/Interpreters/Aggregator.cpp @@ -448,7 +448,7 @@ Block Aggregator::Params::getHeader( { auto & elem = res.getByName(aggregate.column_name); - elem.type = aggregate.function->getReturnType(); + elem.type = aggregate.function->getResultType(); elem.column = elem.type->createColumn(); } } @@ -467,7 +467,7 @@ Block Aggregator::Params::getHeader( DataTypePtr type; if (final) - type = aggregate.function->getReturnType(); + type = aggregate.function->getResultType(); else type = std::make_shared(aggregate.function, argument_types, aggregate.parameters); diff --git a/src/Interpreters/ArrayJoinAction.cpp b/src/Interpreters/ArrayJoinAction.cpp index 51aaa5fb169..ba54f1a324e 100644 --- a/src/Interpreters/ArrayJoinAction.cpp +++ b/src/Interpreters/ArrayJoinAction.cpp @@ -1,6 +1,8 @@ #include -#include #include +#include +#include +#include #include #include #include @@ -16,6 +18,46 @@ namespace ErrorCodes extern const int TYPE_MISMATCH; } +std::shared_ptr getArrayJoinDataType(DataTypePtr type) +{ + if (const auto * array_type = typeid_cast(type.get())) + return std::shared_ptr{type, array_type}; + else if (const auto * map_type = typeid_cast(type.get())) + { + const auto & nested_type = map_type->getNestedType(); + const auto * nested_array_type = typeid_cast(nested_type.get()); + return std::shared_ptr{nested_type, nested_array_type}; + } + else + return nullptr; +} + +ColumnPtr getArrayJoinColumn(const ColumnPtr & column) +{ + if (typeid_cast(column.get())) + return column; + else if (const auto * map = typeid_cast(column.get())) + return map->getNestedColumnPtr(); + else + return nullptr; +} + +const ColumnArray * getArrayJoinColumnRawPtr(const ColumnPtr & column) +{ + if (const auto & col_arr = getArrayJoinColumn(column)) + return typeid_cast(col_arr.get()); + return nullptr; +} + +ColumnWithTypeAndName convertArrayJoinColumn(const ColumnWithTypeAndName & src_col) +{ + ColumnWithTypeAndName array_col; + array_col.name = src_col.name; + array_col.type = getArrayJoinDataType(src_col.type); + array_col.column = getArrayJoinColumn(src_col.column->convertToFullColumnIfConst()); + return array_col; +} + ArrayJoinAction::ArrayJoinAction(const NameSet & array_joined_columns_, bool array_join_is_left, ContextPtr context) : columns(array_joined_columns_) , is_left(array_join_is_left) @@ -28,13 +70,12 @@ ArrayJoinAction::ArrayJoinAction(const NameSet & array_joined_columns_, bool arr { function_length = FunctionFactory::instance().get("length", context); function_greatest = FunctionFactory::instance().get("greatest", context); - function_arrayResize = FunctionFactory::instance().get("arrayResize", context); + function_array_resize = FunctionFactory::instance().get("arrayResize", context); } else if (is_left) function_builder = FunctionFactory::instance().get("emptyArrayToSingle", context); } - void ArrayJoinAction::prepare(ColumnsWithTypeAndName & sample) const { for (auto & current : sample) @@ -42,11 +83,13 @@ void ArrayJoinAction::prepare(ColumnsWithTypeAndName & sample) const if (!columns.contains(current.name)) continue; - const DataTypeArray * array_type = typeid_cast(&*current.type); - if (!array_type) - throw Exception("ARRAY JOIN requires array argument", ErrorCodes::TYPE_MISMATCH); - current.type = array_type->getNestedType(); - current.column = nullptr; + if (const auto & type = getArrayJoinDataType(current.type)) + { + current.column = nullptr; + current.type = type->getNestedType(); + } + else + throw Exception("ARRAY JOIN requires array or map argument", ErrorCodes::TYPE_MISMATCH); } } @@ -55,10 +98,10 @@ void ArrayJoinAction::execute(Block & block) if (columns.empty()) throw Exception("No arrays to join", ErrorCodes::LOGICAL_ERROR); - ColumnPtr any_array_ptr = block.getByName(*columns.begin()).column->convertToFullColumnIfConst(); - const ColumnArray * any_array = typeid_cast(&*any_array_ptr); + ColumnPtr any_array_map_ptr = block.getByName(*columns.begin()).column->convertToFullColumnIfConst(); + const auto * any_array = getArrayJoinColumnRawPtr(any_array_map_ptr); if (!any_array) - throw Exception("ARRAY JOIN of not array: " + *columns.begin(), ErrorCodes::TYPE_MISMATCH); + throw Exception("ARRAY JOIN requires array or map argument", ErrorCodes::TYPE_MISMATCH); /// If LEFT ARRAY JOIN, then we create columns in which empty arrays are replaced by arrays with one element - the default value. std::map non_empty_array_columns; @@ -78,7 +121,8 @@ void ArrayJoinAction::execute(Block & block) { auto & src_col = block.getByName(name); - ColumnsWithTypeAndName tmp_block{src_col}; //, {{}, uint64, {}}}; + ColumnWithTypeAndName array_col = convertArrayJoinColumn(src_col); + ColumnsWithTypeAndName tmp_block{array_col}; //, {{}, uint64, {}}}; auto len_col = function_length->build(tmp_block)->execute(tmp_block, uint64, rows); ColumnsWithTypeAndName tmp_block2{column_of_max_length, {len_col, uint64, {}}}; @@ -89,28 +133,35 @@ void ArrayJoinAction::execute(Block & block) { auto & src_col = block.getByName(name); - ColumnsWithTypeAndName tmp_block{src_col, column_of_max_length}; - src_col.column = function_arrayResize->build(tmp_block)->execute(tmp_block, src_col.type, rows); - any_array_ptr = src_col.column->convertToFullColumnIfConst(); + ColumnWithTypeAndName array_col = convertArrayJoinColumn(src_col); + ColumnsWithTypeAndName tmp_block{array_col, column_of_max_length}; + array_col.column = function_array_resize->build(tmp_block)->execute(tmp_block, array_col.type, rows); + + src_col = std::move(array_col); + any_array_map_ptr = src_col.column->convertToFullColumnIfConst(); } - any_array = typeid_cast(&*any_array_ptr); + any_array = getArrayJoinColumnRawPtr(any_array_map_ptr); + if (!any_array) + throw Exception("ARRAY JOIN requires array or map argument", ErrorCodes::TYPE_MISMATCH); } else if (is_left) { for (const auto & name : columns) { - auto src_col = block.getByName(name); - - ColumnsWithTypeAndName tmp_block{src_col}; - - non_empty_array_columns[name] = function_builder->build(tmp_block)->execute(tmp_block, src_col.type, src_col.column->size()); + const auto & src_col = block.getByName(name); + ColumnWithTypeAndName array_col = convertArrayJoinColumn(src_col); + ColumnsWithTypeAndName tmp_block{array_col}; + non_empty_array_columns[name] = function_builder->build(tmp_block)->execute(tmp_block, array_col.type, array_col.column->size()); } - any_array_ptr = non_empty_array_columns.begin()->second->convertToFullColumnIfConst(); - any_array = &typeid_cast(*any_array_ptr); + any_array_map_ptr = non_empty_array_columns.begin()->second->convertToFullColumnIfConst(); + any_array = getArrayJoinColumnRawPtr(any_array_map_ptr); + if (!any_array) + throw Exception("ARRAY JOIN requires array or map argument", ErrorCodes::TYPE_MISMATCH); } + size_t num_columns = block.columns(); for (size_t i = 0; i < num_columns; ++i) { @@ -118,18 +169,30 @@ void ArrayJoinAction::execute(Block & block) if (columns.contains(current.name)) { - if (!typeid_cast(&*current.type)) - throw Exception("ARRAY JOIN of not array: " + current.name, ErrorCodes::TYPE_MISMATCH); + if (const auto & type = getArrayJoinDataType(current.type)) + { + ColumnPtr array_ptr; + if (typeid_cast(current.type.get())) + { + array_ptr = (is_left && !is_unaligned) ? non_empty_array_columns[current.name] : current.column; + array_ptr = array_ptr->convertToFullColumnIfConst(); + } + else + { + ColumnPtr map_ptr = current.column->convertToFullColumnIfConst(); + const ColumnMap & map = typeid_cast(*map_ptr); + array_ptr = (is_left && !is_unaligned) ? non_empty_array_columns[current.name] : map.getNestedColumnPtr(); + } - ColumnPtr array_ptr = (is_left && !is_unaligned) ? non_empty_array_columns[current.name] : current.column; - array_ptr = array_ptr->convertToFullColumnIfConst(); + const ColumnArray & array = typeid_cast(*array_ptr); + if (!is_unaligned && !array.hasEqualOffsets(*any_array)) + throw Exception("Sizes of ARRAY-JOIN-ed arrays do not match", ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH); - const ColumnArray & array = typeid_cast(*array_ptr); - if (!is_unaligned && !array.hasEqualOffsets(typeid_cast(*any_array_ptr))) - throw Exception("Sizes of ARRAY-JOIN-ed arrays do not match", ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH); - - current.column = typeid_cast(*array_ptr).getDataPtr(); - current.type = typeid_cast(*current.type).getNestedType(); + current.column = typeid_cast(*array_ptr).getDataPtr(); + current.type = type->getNestedType(); + } + else + throw Exception("ARRAY JOIN of not array nor map: " + current.name, ErrorCodes::TYPE_MISMATCH); } else { diff --git a/src/Interpreters/ArrayJoinAction.h b/src/Interpreters/ArrayJoinAction.h index 975bf25a953..3baabd797d7 100644 --- a/src/Interpreters/ArrayJoinAction.h +++ b/src/Interpreters/ArrayJoinAction.h @@ -11,6 +11,15 @@ namespace DB class IFunctionOverloadResolver; using FunctionOverloadResolverPtr = std::shared_ptr; +class DataTypeArray; +class ColumnArray; +std::shared_ptr getArrayJoinDataType(DataTypePtr type); +const ColumnArray * getArrayJoinColumnRawPtr(const ColumnPtr & column); + +/// If input array join column has map type, convert it to array type. +/// Otherwise do nothing. +ColumnWithTypeAndName convertArrayJoinColumn(const ColumnWithTypeAndName & src_col); + class ArrayJoinAction { public: @@ -21,7 +30,7 @@ public: /// For unaligned [LEFT] ARRAY JOIN FunctionOverloadResolverPtr function_length; FunctionOverloadResolverPtr function_greatest; - FunctionOverloadResolverPtr function_arrayResize; + FunctionOverloadResolverPtr function_array_resize; /// For LEFT ARRAY JOIN. FunctionOverloadResolverPtr function_builder; diff --git a/src/Interpreters/Cache/FileCache.cpp b/src/Interpreters/Cache/FileCache.cpp index db95b161a4f..2551e236f7b 100644 --- a/src/Interpreters/Cache/FileCache.cpp +++ b/src/Interpreters/Cache/FileCache.cpp @@ -18,7 +18,6 @@ namespace DB { namespace ErrorCodes { - extern const int REMOTE_FS_OBJECT_CACHE_ERROR; extern const int LOGICAL_ERROR; } @@ -98,7 +97,7 @@ void FileCache::assertInitialized(std::lock_guard & /* cache_lock */ if (initialization_exception) std::rethrow_exception(initialization_exception); else - throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cache not initialized"); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Cache not initialized"); } } @@ -541,12 +540,12 @@ FileSegmentPtr FileCache::createFileSegmentForDownload( #endif if (size > max_file_segment_size) - throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Requested size exceeds max file segment size"); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Requested size exceeds max file segment size"); auto * cell = getCell(key, offset, cache_lock); if (cell) throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Cache cell already exists for key `{}` and offset {}", key.toString(), offset); @@ -738,7 +737,7 @@ bool FileCache::tryReserveForMainList( auto * cell = getCell(entry_key, entry_offset, cache_lock); if (!cell) throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Cache became inconsistent. Key: {}, offset: {}", key.toString(), offset); @@ -964,7 +963,7 @@ void FileCache::remove( catch (...) { throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Removal of cached file failed. Key: {}, offset: {}, path: {}, error: {}", key.toString(), offset, cache_file_path, getCurrentExceptionMessage(false)); } @@ -981,7 +980,7 @@ void FileCache::loadCacheInfoIntoMemory(std::lock_guard & cache_lock /// cache_base_path / key_prefix / key / offset if (!files.empty()) throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Cache initialization is partially made. " "This can be a result of a failed first attempt to initialize cache. " "Please, check log for error messages"); @@ -1214,7 +1213,7 @@ FileCache::FileSegmentCell::FileSegmentCell( } default: throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Can create cell with either EMPTY, DOWNLOADED, DOWNLOADING state, got: {}", FileSegment::stateToString(file_segment->download_state)); } diff --git a/src/Interpreters/Cache/FileSegment.cpp b/src/Interpreters/Cache/FileSegment.cpp index 177c6aecf7c..9c48f16d15e 100644 --- a/src/Interpreters/Cache/FileSegment.cpp +++ b/src/Interpreters/Cache/FileSegment.cpp @@ -19,7 +19,6 @@ namespace DB namespace ErrorCodes { - extern const int REMOTE_FS_OBJECT_CACHE_ERROR; extern const int LOGICAL_ERROR; } @@ -66,7 +65,7 @@ FileSegment::FileSegment( default: { throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Can only create cell with either EMPTY, DOWNLOADED or SKIP_CACHE state"); } } @@ -278,7 +277,7 @@ void FileSegment::resetRemoteFileReader() void FileSegment::write(const char * from, size_t size, size_t offset) { if (!size) - throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Writing zero size is not allowed"); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Writing zero size is not allowed"); { std::unique_lock segment_lock(mutex); @@ -294,7 +293,7 @@ void FileSegment::write(const char * from, size_t size, size_t offset) size_t first_non_downloaded_offset = getFirstNonDownloadedOffsetUnlocked(segment_lock); if (offset != first_non_downloaded_offset) throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Attempt to write {} bytes to offset: {}, but current write offset is {}", size, offset, first_non_downloaded_offset); @@ -304,7 +303,7 @@ void FileSegment::write(const char * from, size_t size, size_t offset) if (free_reserved_size < size) throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Not enough space is reserved. Available: {}, expected: {}", free_reserved_size, size); if (current_downloaded_size == range().size()) @@ -364,7 +363,7 @@ FileSegment::State FileSegment::wait() return download_state; if (download_state == State::EMPTY) - throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Cannot wait on a file segment with empty state"); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Cannot wait on a file segment with empty state"); if (download_state == State::DOWNLOADING) { @@ -382,7 +381,7 @@ FileSegment::State FileSegment::wait() bool FileSegment::reserve(size_t size_to_reserve) { if (!size_to_reserve) - throw Exception(ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, "Zero space reservation is not allowed"); + throw Exception(ErrorCodes::LOGICAL_ERROR, "Zero space reservation is not allowed"); size_t expected_downloaded_size; @@ -396,7 +395,7 @@ bool FileSegment::reserve(size_t size_to_reserve) if (expected_downloaded_size + size_to_reserve > range().size()) throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Attempt to reserve space too much space ({}) for file segment with range: {} (downloaded size: {})", size_to_reserve, range().toString(), downloaded_size); @@ -434,9 +433,6 @@ void FileSegment::setDownloadedUnlocked([[maybe_unused]] std::unique_lockfinalize(); @@ -498,7 +494,7 @@ void FileSegment::completeWithState(State state) { cv.notify_all(); throw Exception( - ErrorCodes::REMOTE_FS_OBJECT_CACHE_ERROR, + ErrorCodes::LOGICAL_ERROR, "Cannot complete file segment with state: {}", stateToString(state)); } @@ -559,8 +555,7 @@ void FileSegment::completeBasedOnCurrentState(std::lock_guard & cach { if (is_last_holder) cache->remove(key(), offset(), cache_lock, segment_lock); - - return; + break; } case State::DOWNLOADED: { @@ -613,6 +608,7 @@ void FileSegment::completeBasedOnCurrentState(std::lock_guard & cach } } + is_completed = true; LOG_TEST(log, "Completed file segment: {}", getInfoForLogUnlocked(segment_lock)); } @@ -748,6 +744,12 @@ bool FileSegment::isDetached() const return is_detached; } +bool FileSegment::isCompleted() const +{ + std::unique_lock segment_lock(mutex); + return is_completed; +} + void FileSegment::detach(std::lock_guard & /* cache_lock */, std::unique_lock & segment_lock) { if (is_detached) diff --git a/src/Interpreters/Cache/FileSegment.h b/src/Interpreters/Cache/FileSegment.h index 8f9c0097d77..df2e54c4d78 100644 --- a/src/Interpreters/Cache/FileSegment.h +++ b/src/Interpreters/Cache/FileSegment.h @@ -181,6 +181,8 @@ public: bool isDetached() const; + bool isCompleted() const; + void assertCorrectness() const; /** @@ -294,6 +296,7 @@ private: /// "detached" file segment means that it is not owned by cache ("detached" from cache). /// In general case, all file segments are owned by cache. bool is_detached = false; + bool is_completed = false; bool is_downloaded{false}; @@ -317,11 +320,6 @@ struct FileSegmentsHolder : private boost::noncopyable String toString(); - FileSegments::iterator add(FileSegmentPtr && file_segment) - { - return file_segments.insert(file_segments.end(), file_segment); - } - FileSegments file_segments{}; }; diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index c124eb1c881..9e56aba9c0a 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -3743,6 +3743,8 @@ WriteSettings Context::getWriteSettings() const res.enable_filesystem_cache_on_write_operations = settings.enable_filesystem_cache_on_write_operations; res.enable_filesystem_cache_log = settings.enable_filesystem_cache_log; + res.throw_on_error_from_cache = settings.throw_on_error_from_cache_on_write_operations; + res.s3_allow_parallel_part_upload = settings.s3_allow_parallel_part_upload; res.remote_throttler = getRemoteWriteThrottler(); diff --git a/src/Interpreters/ExpressionActions.cpp b/src/Interpreters/ExpressionActions.cpp index 9b38072b5af..d89be9f3e2e 100644 --- a/src/Interpreters/ExpressionActions.cpp +++ b/src/Interpreters/ExpressionActions.cpp @@ -620,9 +620,9 @@ static void executeAction(const ExpressionActions::Action & action, ExecutionCon array_join_key.column = array_join_key.column->convertToFullColumnIfConst(); - const ColumnArray * array = typeid_cast(array_join_key.column.get()); + const auto * array = getArrayJoinColumnRawPtr(array_join_key.column); if (!array) - throw Exception("ARRAY JOIN of not array: " + action.node->result_name, ErrorCodes::TYPE_MISMATCH); + throw Exception("ARRAY JOIN of not array nor map: " + action.node->result_name, ErrorCodes::TYPE_MISMATCH); for (auto & column : columns) if (column.column) @@ -635,7 +635,7 @@ static void executeAction(const ExpressionActions::Action & action, ExecutionCon auto & res_column = columns[action.result_position]; res_column.column = array->getDataPtr(); - res_column.type = assert_cast(*array_join_key.type).getNestedType(); + res_column.type = getArrayJoinDataType(array_join_key.type)->getNestedType(); res_column.name = action.node->result_name; num_rows = res_column.column->size(); @@ -1008,7 +1008,7 @@ ExpressionActionsChain::ArrayJoinStep::ArrayJoinStep(ArrayJoinActionPtr array_jo if (array_join->columns.contains(column.name)) { - const auto * array = typeid_cast(column.type.get()); + const auto & array = getArrayJoinDataType(column.type); column.type = array->getNestedType(); /// Arrays are materialized column.column = nullptr; diff --git a/src/Interpreters/ExpressionAnalyzer.cpp b/src/Interpreters/ExpressionAnalyzer.cpp index bc93abff534..a3db464fbbb 100644 --- a/src/Interpreters/ExpressionAnalyzer.cpp +++ b/src/Interpreters/ExpressionAnalyzer.cpp @@ -425,7 +425,7 @@ void ExpressionAnalyzer::analyzeAggregation(ActionsDAGPtr & temp_actions) aggregated_columns = temp_actions->getNamesAndTypesList(); for (const auto & desc : aggregate_descriptions) - aggregated_columns.emplace_back(desc.column_name, desc.function->getReturnType()); + aggregated_columns.emplace_back(desc.column_name, desc.function->getResultType()); } @@ -2074,7 +2074,7 @@ ExpressionAnalysisResult::ExpressionAnalysisResult( for (const auto & f : w.window_functions) { query_analyzer.columns_after_window.push_back( - {f.column_name, f.aggregate_function->getReturnType()}); + {f.column_name, f.aggregate_function->getResultType()}); } } diff --git a/src/Interpreters/ExpressionJIT.cpp b/src/Interpreters/ExpressionJIT.cpp index 3a2c2e333a9..dfc88e97052 100644 --- a/src/Interpreters/ExpressionJIT.cpp +++ b/src/Interpreters/ExpressionJIT.cpp @@ -263,7 +263,7 @@ public: return result; } - static void applyFunction(IFunctionBase & function, Field & value) + static void applyFunction(const IFunctionBase & function, Field & value) { const auto & type = function.getArgumentTypes().at(0); ColumnsWithTypeAndName args{{type->createColumnConst(1, value), type, "x" }}; @@ -338,7 +338,7 @@ static bool isCompilableFunction(const ActionsDAG::Node & node, const std::unord if (node.type != ActionsDAG::ActionType::FUNCTION) return false; - auto & function = *node.function_base; + const auto & function = *node.function_base; IFunction::ShortCircuitSettings settings; if (function.isShortCircuit(settings, node.children.size())) diff --git a/src/Interpreters/JIT/compileFunction.cpp b/src/Interpreters/JIT/compileFunction.cpp index e12b4894eb0..8bf0eb25b60 100644 --- a/src/Interpreters/JIT/compileFunction.cpp +++ b/src/Interpreters/JIT/compileFunction.cpp @@ -403,7 +403,7 @@ static void compileInsertAggregatesIntoResultColumns(llvm::Module & module, cons std::vector columns(functions.size()); for (size_t i = 0; i < functions.size(); ++i) { - auto return_type = functions[i].function->getReturnType(); + auto return_type = functions[i].function->getResultType(); auto * data = b.CreateLoad(column_type, b.CreateConstInBoundsGEP1_64(column_type, columns_arg, i)); auto * column_data_type = toNativeType(b, removeNullable(return_type)); diff --git a/src/Interpreters/MutationsInterpreter.cpp b/src/Interpreters/MutationsInterpreter.cpp index 26b8bce1f4a..1578e454049 100644 --- a/src/Interpreters/MutationsInterpreter.cpp +++ b/src/Interpreters/MutationsInterpreter.cpp @@ -220,8 +220,13 @@ bool isStorageTouchedByMutations( if (all_commands_can_be_skipped) return false; + /// We must read with one thread because it guarantees that + /// output stream will be sorted after reading from MergeTree parts. + /// Disable all settings that can enable reading with several streams. context_copy->setSetting("max_streams_to_max_threads_ratio", 1); context_copy->setSetting("max_threads", 1); + context_copy->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false); + context_copy->setSetting("max_streams_for_merge_tree_reading", Field(0)); ASTPtr select_query = prepareQueryAffectedAST(commands, storage, context_copy); diff --git a/src/Interpreters/Set.h b/src/Interpreters/Set.h index 44f543ce222..bafb0dcea7a 100644 --- a/src/Interpreters/Set.h +++ b/src/Interpreters/Set.h @@ -18,7 +18,7 @@ struct Range; class Context; class IFunctionBase; -using FunctionBasePtr = std::shared_ptr; +using FunctionBasePtr = std::shared_ptr; class Chunk; diff --git a/src/Parsers/ExpressionElementParsers.cpp b/src/Parsers/ExpressionElementParsers.cpp index 74d14292459..01c730adf37 100644 --- a/src/Parsers/ExpressionElementParsers.cpp +++ b/src/Parsers/ExpressionElementParsers.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -986,6 +987,38 @@ bool ParserUnsignedInteger::parseImpl(Pos & pos, ASTPtr & node, Expected & expec return true; } +inline static bool makeStringLiteral(IParser::Pos & pos, ASTPtr & node, String str) +{ + auto literal = std::make_shared(str); + literal->begin = pos; + literal->end = ++pos; + node = literal; + return true; +} + +inline static bool makeHexOrBinStringLiteral(IParser::Pos & pos, ASTPtr & node, bool hex, size_t word_size) +{ + const char * str_begin = pos->begin + 2; + const char * str_end = pos->end - 1; + if (str_begin == str_end) + return makeStringLiteral(pos, node, ""); + + PODArray res; + res.resize((pos->size() + word_size) / word_size + 1); + char * res_begin = reinterpret_cast(res.data()); + char * res_pos = res_begin; + + if (hex) + { + hexStringDecode(str_begin, str_end, res_pos); + } + else + { + binStringDecode(str_begin, str_end, res_pos); + } + + return makeStringLiteral(pos, node, String(reinterpret_cast(res.data()), (res_pos - res_begin - 1))); +} bool ParserStringLiteral::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) { @@ -996,6 +1029,18 @@ bool ParserStringLiteral::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte if (pos->type == TokenType::StringLiteral) { + if (*pos->begin == 'x' || *pos->begin == 'X') + { + constexpr size_t word_size = 2; + return makeHexOrBinStringLiteral(pos, node, true, word_size); + } + + if (*pos->begin == 'b' || *pos->begin == 'B') + { + constexpr size_t word_size = 8; + return makeHexOrBinStringLiteral(pos, node, false, word_size); + } + ReadBufferFromMemory in(pos->begin, pos->size()); try @@ -1022,11 +1067,7 @@ bool ParserStringLiteral::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte s = String(pos->begin + heredoc_size, pos->size() - heredoc_size * 2); } - auto literal = std::make_shared(s); - literal->begin = pos; - literal->end = ++pos; - node = literal; - return true; + return makeStringLiteral(pos, node, s); } template @@ -1128,36 +1169,42 @@ class ICollection { public: virtual ~ICollection() = default; - virtual bool parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected) = 0; + virtual bool parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected, bool allow_map) = 0; }; template class CommonCollection : public ICollection { public: - bool parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected) override; + explicit CommonCollection(const IParser::Pos & pos) : begin(pos) {} + + bool parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected, bool allow_map) override; private: Container container; + IParser::Pos begin; }; class MapCollection : public ICollection { public: - bool parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected) override; + explicit MapCollection(const IParser::Pos & pos) : begin(pos) {} + + bool parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected, bool allow_map) override; private: Map container; + IParser::Pos begin; }; -bool parseAllCollectionsStart(IParser::Pos & pos, Collections & collections, Expected & /*expected*/) +bool parseAllCollectionsStart(IParser::Pos & pos, Collections & collections, Expected & /*expected*/, bool allow_map) { - if (pos->type == TokenType::OpeningCurlyBrace) - collections.push_back(std::make_unique()); + if (allow_map && pos->type == TokenType::OpeningCurlyBrace) + collections.push_back(std::make_unique(pos)); else if (pos->type == TokenType::OpeningRoundBracket) - collections.push_back(std::make_unique>()); + collections.push_back(std::make_unique>(pos)); else if (pos->type == TokenType::OpeningSquareBracket) - collections.push_back(std::make_unique>()); + collections.push_back(std::make_unique>(pos)); else return false; @@ -1166,7 +1213,7 @@ bool parseAllCollectionsStart(IParser::Pos & pos, Collections & collections, Exp } template -bool CommonCollection::parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected) +bool CommonCollection::parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected, bool allow_map) { if (node) { @@ -1183,23 +1230,27 @@ bool CommonCollection::parse(IParser::Pos & pos, Collectio { if (end_p.ignore(pos, expected)) { - node = std::make_shared(std::move(container)); + auto result = std::make_shared(std::move(container)); + result->begin = begin; + result->end = pos; + + node = std::move(result); break; } if (!container.empty() && !comma_p.ignore(pos, expected)) - return false; + return false; if (literal_p.parse(pos, literal, expected)) container.push_back(std::move(literal->as().value)); else - return parseAllCollectionsStart(pos, collections, expected); + return parseAllCollectionsStart(pos, collections, expected, allow_map); } return true; } -bool MapCollection::parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected) +bool MapCollection::parse(IParser::Pos & pos, Collections & collections, ASTPtr & node, Expected & expected, bool allow_map) { if (node) { @@ -1217,7 +1268,11 @@ bool MapCollection::parse(IParser::Pos & pos, Collections & collections, ASTPtr { if (end_p.ignore(pos, expected)) { - node = std::make_shared(std::move(container)); + auto result = std::make_shared(std::move(container)); + result->begin = begin; + result->end = pos; + + node = std::move(result); break; } @@ -1235,7 +1290,7 @@ bool MapCollection::parse(IParser::Pos & pos, Collections & collections, ASTPtr if (literal_p.parse(pos, literal, expected)) container.push_back(std::move(literal->as().value)); else - return parseAllCollectionsStart(pos, collections, expected); + return parseAllCollectionsStart(pos, collections, expected, allow_map); } return true; @@ -1248,12 +1303,12 @@ bool ParserAllCollectionsOfLiterals::parseImpl(Pos & pos, ASTPtr & node, Expecte { Collections collections; - if (!parseAllCollectionsStart(pos, collections, expected)) + if (!parseAllCollectionsStart(pos, collections, expected, allow_map)) return false; while (!collections.empty()) { - if (!collections.back()->parse(pos, collections, node, expected)) + if (!collections.back()->parse(pos, collections, node, expected, allow_map)) return false; if (node) diff --git a/src/Parsers/ExpressionElementParsers.h b/src/Parsers/ExpressionElementParsers.h index 8e328db976b..cc88faf2653 100644 --- a/src/Parsers/ExpressionElementParsers.h +++ b/src/Parsers/ExpressionElementParsers.h @@ -307,9 +307,14 @@ protected: class ParserAllCollectionsOfLiterals : public IParserBase { public: + explicit ParserAllCollectionsOfLiterals(bool allow_map_ = true) : allow_map(allow_map_) {} + protected: const char * getName() const override { return "combination of maps, arrays, tuples"; } bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; + +private: + bool allow_map; }; diff --git a/src/Parsers/Lexer.cpp b/src/Parsers/Lexer.cpp index 6bd27ee62ae..be67807ad8f 100644 --- a/src/Parsers/Lexer.cpp +++ b/src/Parsers/Lexer.cpp @@ -1,3 +1,4 @@ +#include #include #include #include @@ -44,6 +45,36 @@ Token quotedString(const char *& pos, const char * const token_begin, const char } } +Token quotedHexOrBinString(const char *& pos, const char * const token_begin, const char * const end) +{ + constexpr char quote = '\''; + + assert(pos[1] == quote); + + bool hex = (*pos == 'x' || *pos == 'X'); + + pos += 2; + + if (hex) + { + while (pos < end && isHexDigit(*pos)) + ++pos; + } + else + { + pos = find_first_not_symbols<'0', '1'>(pos, end); + } + + if (pos >= end || *pos != quote) + { + pos = end; + return Token(TokenType::ErrorSingleQuoteIsNotClosed, token_begin, end); + } + + ++pos; + return Token(TokenType::StringLiteral, token_begin, pos); +} + } @@ -420,6 +451,12 @@ Token Lexer::nextTokenImpl() return Token(TokenType::DollarSign, token_begin, ++pos); } } + + if (pos + 2 < end && pos[1] == '\'' && (*pos == 'x' || *pos == 'b' || *pos == 'X' || *pos == 'B')) + { + return quotedHexOrBinString(pos, token_begin, end); + } + if (isWordCharASCII(*pos) || *pos == '$') { ++pos; diff --git a/src/Planner/Planner.cpp b/src/Planner/Planner.cpp index d88766f3656..6b2de30722c 100644 --- a/src/Planner/Planner.cpp +++ b/src/Planner/Planner.cpp @@ -349,8 +349,8 @@ void Planner::buildQueryPlanIfNeeded() { auto function_node = std::make_shared("and"); auto and_function = FunctionFactory::instance().get("and", query_context); - function_node->resolveAsFunction(std::move(and_function), std::make_shared()); function_node->getArguments().getNodes() = {query_node.getPrewhere(), query_node.getWhere()}; + function_node->resolveAsFunction(and_function->build(function_node->getArgumentTypes())); query_node.getWhere() = std::move(function_node); query_node.getPrewhere() = {}; } diff --git a/src/Planner/PlannerActionsVisitor.cpp b/src/Planner/PlannerActionsVisitor.cpp index aa1b61e5559..95edd93dd9f 100644 --- a/src/Planner/PlannerActionsVisitor.cpp +++ b/src/Planner/PlannerActionsVisitor.cpp @@ -121,7 +121,8 @@ public: return node; } - const ActionsDAG::Node * addFunctionIfNecessary(const std::string & node_name, ActionsDAG::NodeRawConstPtrs children, FunctionOverloadResolverPtr function) + template + const ActionsDAG::Node * addFunctionIfNecessary(const std::string & node_name, ActionsDAG::NodeRawConstPtrs children, FunctionOrOverloadResolver function) { auto it = node_name_to_node.find(node_name); if (it != node_name_to_node.end()) @@ -325,6 +326,7 @@ PlannerActionsVisitorImpl::NodeNameAndNodeMinLevel PlannerActionsVisitorImpl::vi lambda_actions, captured_column_names, lambda_arguments_names_and_types, result_type, lambda_expression_node_name); actions_stack.pop_back(); + // TODO: Pass IFunctionBase here not FunctionCaptureOverloadResolver. actions_stack[level].addFunctionIfNecessary(lambda_node_name, std::move(lambda_children), std::move(function_capture)); size_t actions_stack_size = actions_stack.size(); diff --git a/src/Planner/PlannerAggregation.cpp b/src/Planner/PlannerAggregation.cpp index a1a8b54426a..05e7b5418e3 100644 --- a/src/Planner/PlannerAggregation.cpp +++ b/src/Planner/PlannerAggregation.cpp @@ -101,14 +101,14 @@ public: { auto grouping_ordinary_function = std::make_shared(arguments_indexes, force_grouping_standard_compatibility); auto grouping_ordinary_function_adaptor = std::make_shared(std::move(grouping_ordinary_function)); - function_node->resolveAsFunction(std::move(grouping_ordinary_function_adaptor), std::make_shared()); + function_node->resolveAsFunction(grouping_ordinary_function_adaptor->build({})); break; } case GroupByKind::ROLLUP: { auto grouping_rollup_function = std::make_shared(arguments_indexes, aggregation_keys_size, force_grouping_standard_compatibility); auto grouping_rollup_function_adaptor = std::make_shared(std::move(grouping_rollup_function)); - function_node->resolveAsFunction(std::move(grouping_rollup_function_adaptor), std::make_shared()); + function_node->resolveAsFunction(grouping_rollup_function_adaptor->build({})); function_node->getArguments().getNodes().push_back(std::move(grouping_set_argument_column)); break; } @@ -116,7 +116,7 @@ public: { auto grouping_cube_function = std::make_shared(arguments_indexes, aggregation_keys_size, force_grouping_standard_compatibility); auto grouping_cube_function_adaptor = std::make_shared(std::move(grouping_cube_function)); - function_node->resolveAsFunction(std::move(grouping_cube_function_adaptor), std::make_shared()); + function_node->resolveAsFunction(grouping_cube_function_adaptor->build({})); function_node->getArguments().getNodes().push_back(std::move(grouping_set_argument_column)); break; } @@ -124,7 +124,7 @@ public: { auto grouping_grouping_sets_function = std::make_shared(arguments_indexes, grouping_sets_keys_indices, force_grouping_standard_compatibility); auto grouping_grouping_sets_function_adaptor = std::make_shared(std::move(grouping_grouping_sets_function)); - function_node->resolveAsFunction(std::move(grouping_grouping_sets_function_adaptor), std::make_shared()); + function_node->resolveAsFunction(grouping_grouping_sets_function_adaptor->build({})); function_node->getArguments().getNodes().push_back(std::move(grouping_set_argument_column)); break; } diff --git a/src/Planner/PlannerExpressionAnalysis.cpp b/src/Planner/PlannerExpressionAnalysis.cpp index 9db268512be..91a04b090fc 100644 --- a/src/Planner/PlannerExpressionAnalysis.cpp +++ b/src/Planner/PlannerExpressionAnalysis.cpp @@ -65,7 +65,7 @@ std::optional analyzeAggregation(QueryTreeNodePtr & q ColumnsWithTypeAndName aggregates_columns; aggregates_columns.reserve(aggregates_descriptions.size()); for (auto & aggregate_description : aggregates_descriptions) - aggregates_columns.emplace_back(nullptr, aggregate_description.function->getReturnType(), aggregate_description.column_name); + aggregates_columns.emplace_back(nullptr, aggregate_description.function->getResultType(), aggregate_description.column_name); Names aggregation_keys; @@ -284,7 +284,7 @@ std::optional analyzeWindow(QueryTreeNodePtr & query_tree, for (auto & window_description : window_descriptions) for (auto & window_function : window_description.window_functions) - window_functions_additional_columns.emplace_back(nullptr, window_function.aggregate_function->getReturnType(), window_function.column_name); + window_functions_additional_columns.emplace_back(nullptr, window_function.aggregate_function->getResultType(), window_function.column_name); auto before_window_step = std::make_unique(before_window_actions, ActionsChainStep::AvailableOutputColumnsStrategy::ALL_NODES, diff --git a/src/Processors/Formats/Impl/BSONEachRowRowInputFormat.cpp b/src/Processors/Formats/Impl/BSONEachRowRowInputFormat.cpp index 4d2ac6a5420..fd0c553538f 100644 --- a/src/Processors/Formats/Impl/BSONEachRowRowInputFormat.cpp +++ b/src/Processors/Formats/Impl/BSONEachRowRowInputFormat.cpp @@ -18,6 +18,7 @@ #include #include +#include #include #include #include @@ -282,7 +283,7 @@ static void readAndInsertString(ReadBuffer & in, IColumn & column, BSONType bson } else if (bson_type == BSONType::OBJECT_ID) { - readAndInsertStringImpl(in, column, 12); + readAndInsertStringImpl(in, column, BSON_OBJECT_ID_SIZE); } else { @@ -664,7 +665,7 @@ static void skipBSONField(ReadBuffer & in, BSONType type) } case BSONType::OBJECT_ID: { - in.ignore(12); + in.ignore(BSON_OBJECT_ID_SIZE); break; } case BSONType::REGEXP: @@ -677,7 +678,7 @@ static void skipBSONField(ReadBuffer & in, BSONType type) { BSONSizeT size; readBinary(size, in); - in.ignore(size + 12); + in.ignore(size + BSON_DB_POINTER_SIZE); break; } case BSONType::JAVA_SCRIPT_CODE_W_SCOPE: @@ -796,7 +797,6 @@ DataTypePtr BSONEachRowSchemaReader::getDataTypeFromBSONField(BSONType type, boo } case BSONType::SYMBOL: [[fallthrough]]; case BSONType::JAVA_SCRIPT_CODE: [[fallthrough]]; - case BSONType::OBJECT_ID: [[fallthrough]]; case BSONType::STRING: { BSONSizeT size; @@ -804,6 +804,11 @@ DataTypePtr BSONEachRowSchemaReader::getDataTypeFromBSONField(BSONType type, boo in.ignore(size); return std::make_shared(); } + case BSONType::OBJECT_ID:; + { + in.ignore(BSON_OBJECT_ID_SIZE); + return makeNullable(std::make_shared(BSON_OBJECT_ID_SIZE)); + } case BSONType::DOCUMENT: { auto nested_names_and_types = getDataTypesFromBSONDocument(false); @@ -954,6 +959,7 @@ void registerInputFormatBSONEachRow(FormatFactory & factory) "BSONEachRow", [](ReadBuffer & buf, const Block & sample, IRowInputFormat::Params params, const FormatSettings & settings) { return std::make_shared(buf, sample, std::move(params), settings); }); + factory.registerFileExtension("bson", "BSONEachRow"); } void registerFileSegmentationEngineBSONEachRow(FormatFactory & factory) diff --git a/src/Processors/Formats/Impl/CapnProtoRowInputFormat.cpp b/src/Processors/Formats/Impl/CapnProtoRowInputFormat.cpp index 08d2cac743a..58ace9cfca5 100644 --- a/src/Processors/Formats/Impl/CapnProtoRowInputFormat.cpp +++ b/src/Processors/Formats/Impl/CapnProtoRowInputFormat.cpp @@ -99,6 +99,12 @@ static void insertSignedInteger(IColumn & column, const DataTypePtr & column_typ case TypeIndex::DateTime64: assert_cast &>(column).insertValue(value); break; + case TypeIndex::Decimal32: + assert_cast &>(column).insertValue(static_cast(value)); + break; + case TypeIndex::Decimal64: + assert_cast &>(column).insertValue(value); + break; default: throw Exception(ErrorCodes::LOGICAL_ERROR, "Column type is not a signed integer."); } @@ -178,14 +184,14 @@ static void insertEnum(IColumn & column, const DataTypePtr & column_type, const } } -static void insertValue(IColumn & column, const DataTypePtr & column_type, const capnp::DynamicValue::Reader & value, FormatSettings::EnumComparingMode enum_comparing_mode) +static void insertValue(IColumn & column, const DataTypePtr & column_type, const String & column_name, const capnp::DynamicValue::Reader & value, FormatSettings::EnumComparingMode enum_comparing_mode) { if (column_type->lowCardinality()) { auto & lc_column = assert_cast(column); auto tmp_column = lc_column.getDictionary().getNestedColumn()->cloneEmpty(); auto dict_type = assert_cast(column_type.get())->getDictionaryType(); - insertValue(*tmp_column, dict_type, value, enum_comparing_mode); + insertValue(*tmp_column, dict_type, column_name, value, enum_comparing_mode); lc_column.insertFromFullColumn(*tmp_column, 0); return; } @@ -226,7 +232,7 @@ static void insertValue(IColumn & column, const DataTypePtr & column_type, const auto & nested_column = column_array.getData(); auto nested_type = assert_cast(column_type.get())->getNestedType(); for (const auto & nested_value : list_value) - insertValue(nested_column, nested_type, nested_value, enum_comparing_mode); + insertValue(nested_column, nested_type, column_name, nested_value, enum_comparing_mode); break; } case capnp::DynamicValue::Type::STRUCT: @@ -243,11 +249,11 @@ static void insertValue(IColumn & column, const DataTypePtr & column_type, const auto & nested_column = nullable_column.getNestedColumn(); auto nested_type = assert_cast(column_type.get())->getNestedType(); auto nested_value = struct_value.get(field); - insertValue(nested_column, nested_type, nested_value, enum_comparing_mode); + insertValue(nested_column, nested_type, column_name, nested_value, enum_comparing_mode); nullable_column.getNullMapData().push_back(0); } } - else + else if (isTuple(column_type)) { auto & tuple_column = assert_cast(column); const auto * tuple_type = assert_cast(column_type.get()); @@ -255,9 +261,16 @@ static void insertValue(IColumn & column, const DataTypePtr & column_type, const insertValue( tuple_column.getColumn(i), tuple_type->getElements()[i], + tuple_type->getElementNames()[i], struct_value.get(tuple_type->getElementNames()[i]), enum_comparing_mode); } + else + { + /// It can be nested column from Nested type. + auto [field_name, nested_name] = splitCapnProtoFieldName(column_name); + insertValue(column, column_type, nested_name, struct_value.get(nested_name), enum_comparing_mode); + } break; } default: @@ -278,7 +291,7 @@ bool CapnProtoRowInputFormat::readRow(MutableColumns & columns, RowReadExtension for (size_t i = 0; i != columns.size(); ++i) { auto value = getReaderByColumnName(root_reader, column_names[i]); - insertValue(*columns[i], column_types[i], value, format_settings.capn_proto.enum_comparing_mode); + insertValue(*columns[i], column_types[i], column_names[i], value, format_settings.capn_proto.enum_comparing_mode); } } catch (const kj::Exception & e) diff --git a/src/Processors/Formats/Impl/CapnProtoRowOutputFormat.cpp b/src/Processors/Formats/Impl/CapnProtoRowOutputFormat.cpp index 654917b6357..bcf362d1e0b 100644 --- a/src/Processors/Formats/Impl/CapnProtoRowOutputFormat.cpp +++ b/src/Processors/Formats/Impl/CapnProtoRowOutputFormat.cpp @@ -92,6 +92,7 @@ static std::optional convertToDynamicValue( const ColumnPtr & column, const DataTypePtr & data_type, size_t row_num, + const String & column_name, capnp::DynamicValue::Builder builder, FormatSettings::EnumComparingMode enum_comparing_mode, std::vector> & temporary_text_data_storage) @@ -103,15 +104,12 @@ static std::optional convertToDynamicValue( const auto * lc_column = assert_cast(column.get()); const auto & dict_type = assert_cast(data_type.get())->getDictionaryType(); size_t index = lc_column->getIndexAt(row_num); - return convertToDynamicValue(lc_column->getDictionary().getNestedColumn(), dict_type, index, builder, enum_comparing_mode, temporary_text_data_storage); + return convertToDynamicValue(lc_column->getDictionary().getNestedColumn(), dict_type, index, column_name, builder, enum_comparing_mode, temporary_text_data_storage); } switch (builder.getType()) { case capnp::DynamicValue::Type::INT: - /// We allow output DateTime64 as Int64. - if (WhichDataType(data_type).isDateTime64()) - return capnp::DynamicValue::Reader(assert_cast *>(column.get())->getElement(row_num)); return capnp::DynamicValue::Reader(column->getInt(row_num)); case capnp::DynamicValue::Type::UINT: return capnp::DynamicValue::Reader(column->getUInt(row_num)); @@ -150,7 +148,7 @@ static std::optional convertToDynamicValue( { auto struct_builder = builder.as(); auto nested_struct_schema = struct_builder.getSchema(); - /// Struct can be represent Tuple or Naullable (named union with two fields) + /// Struct can represent Tuple, Nullable (named union with two fields) or single column when it contains one nested column. if (data_type->isNullable()) { const auto * nullable_type = assert_cast(data_type.get()); @@ -167,12 +165,12 @@ static std::optional convertToDynamicValue( struct_builder.clear(value_field); const auto & nested_column = nullable_column->getNestedColumnPtr(); auto value_builder = initStructFieldBuilder(nested_column, row_num, struct_builder, value_field); - auto value = convertToDynamicValue(nested_column, nullable_type->getNestedType(), row_num, value_builder, enum_comparing_mode, temporary_text_data_storage); + auto value = convertToDynamicValue(nested_column, nullable_type->getNestedType(), row_num, column_name, value_builder, enum_comparing_mode, temporary_text_data_storage); if (value) struct_builder.set(value_field, *value); } } - else + else if (isTuple(data_type)) { const auto * tuple_data_type = assert_cast(data_type.get()); auto nested_types = tuple_data_type->getElements(); @@ -182,11 +180,21 @@ static std::optional convertToDynamicValue( auto pos = tuple_data_type->getPositionByName(name); auto field_builder = initStructFieldBuilder(nested_columns[pos], row_num, struct_builder, nested_struct_schema.getFieldByName(name)); - auto value = convertToDynamicValue(nested_columns[pos], nested_types[pos], row_num, field_builder, enum_comparing_mode, temporary_text_data_storage); + auto value = convertToDynamicValue(nested_columns[pos], nested_types[pos], row_num, column_name, field_builder, enum_comparing_mode, temporary_text_data_storage); if (value) struct_builder.set(name, *value); } } + else + { + /// It can be nested column from Nested type. + auto [field_name, nested_name] = splitCapnProtoFieldName(column_name); + auto nested_field = nested_struct_schema.getFieldByName(nested_name); + auto field_builder = initStructFieldBuilder(column, row_num, struct_builder, nested_field); + auto value = convertToDynamicValue(column, data_type, row_num, nested_name, field_builder, enum_comparing_mode, temporary_text_data_storage); + if (value) + struct_builder.set(nested_field, *value); + } return std::nullopt; } case capnp::DynamicValue::Type::LIST: @@ -213,7 +221,7 @@ static std::optional convertToDynamicValue( else value_builder = list_builder[i]; - auto value = convertToDynamicValue(nested_column, nested_type, offset + i, value_builder, enum_comparing_mode, temporary_text_data_storage); + auto value = convertToDynamicValue(nested_column, nested_type, offset + i, column_name, value_builder, enum_comparing_mode, temporary_text_data_storage); if (value) list_builder.set(i, *value); } @@ -231,11 +239,19 @@ void CapnProtoRowOutputFormat::write(const Columns & columns, size_t row_num) /// See comment in convertToDynamicValue() for more details. std::vector> temporary_text_data_storage; capnp::DynamicStruct::Builder root = message.initRoot(schema); + + /// Some columns can share same field builder. For example when we have + /// column with Nested type that was flattened into several columns. + std::unordered_map field_builders; for (size_t i = 0; i != columns.size(); ++i) { auto [struct_builder, field] = getStructBuilderAndFieldByColumnName(root, column_names[i]); - auto field_builder = initStructFieldBuilder(columns[i], row_num, struct_builder, field); - auto value = convertToDynamicValue(columns[i], column_types[i], row_num, field_builder, format_settings.capn_proto.enum_comparing_mode, temporary_text_data_storage); + if (!field_builders.contains(field.getIndex())) + { + auto field_builder = initStructFieldBuilder(columns[i], row_num, struct_builder, field); + field_builders[field.getIndex()] = field_builder; + } + auto value = convertToDynamicValue(columns[i], column_types[i], row_num, column_names[i], field_builders[field.getIndex()], format_settings.capn_proto.enum_comparing_mode, temporary_text_data_storage); if (value) struct_builder.set(field, *value); } diff --git a/src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp b/src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp index c79c667a988..ee3177e132f 100644 --- a/src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp +++ b/src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp @@ -382,7 +382,7 @@ static MutableColumns getMergedDataColumns( for (const auto & desc : def.columns_to_aggregate) { // Wrap aggregated columns in a tuple to match function signature - if (!desc.is_agg_func_type && !desc.is_simple_agg_func_type && isTuple(desc.function->getReturnType())) + if (!desc.is_agg_func_type && !desc.is_simple_agg_func_type && isTuple(desc.function->getResultType())) { size_t tuple_size = desc.column_numbers.size(); MutableColumns tuple_columns(tuple_size); @@ -439,7 +439,7 @@ static void postprocessChunk( auto column = std::move(columns[next_column]); ++next_column; - if (!desc.is_agg_func_type && !desc.is_simple_agg_func_type && isTuple(desc.function->getReturnType())) + if (!desc.is_agg_func_type && !desc.is_simple_agg_func_type && isTuple(desc.function->getResultType())) { /// Unpack tuple into block. size_t tuple_size = desc.column_numbers.size(); diff --git a/src/Processors/QueryPlan/WindowStep.cpp b/src/Processors/QueryPlan/WindowStep.cpp index b67b394b57b..92e9948c4c7 100644 --- a/src/Processors/QueryPlan/WindowStep.cpp +++ b/src/Processors/QueryPlan/WindowStep.cpp @@ -35,7 +35,7 @@ static Block addWindowFunctionResultColumns(const Block & block, { ColumnWithTypeAndName column_with_type; column_with_type.name = f.column_name; - column_with_type.type = f.aggregate_function->getReturnType(); + column_with_type.type = f.aggregate_function->getResultType(); column_with_type.column = column_with_type.type->createColumn(); result.insert(column_with_type); diff --git a/src/Processors/Transforms/WindowTransform.cpp b/src/Processors/Transforms/WindowTransform.cpp index 4d3eb1f0bbd..287f46017cb 100644 --- a/src/Processors/Transforms/WindowTransform.cpp +++ b/src/Processors/Transforms/WindowTransform.cpp @@ -1067,7 +1067,7 @@ void WindowTransform::appendChunk(Chunk & chunk) // Initialize output columns. for (auto & ws : workspaces) { - block.output_columns.push_back(ws.aggregate_function->getReturnType() + block.output_columns.push_back(ws.aggregate_function->getResultType() ->createColumn()); block.output_columns.back()->reserve(block.rows); } @@ -1441,8 +1441,8 @@ struct WindowFunction { std::string name; - WindowFunction(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : IAggregateFunctionHelper(argument_types_, parameters_) + WindowFunction(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_, const DataTypePtr & result_type_) + : IAggregateFunctionHelper(argument_types_, parameters_, result_type_) , name(name_) {} @@ -1472,12 +1472,9 @@ struct WindowFunctionRank final : public WindowFunction { WindowFunctionRank(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : WindowFunction(name_, argument_types_, parameters_) + : WindowFunction(name_, argument_types_, parameters_, std::make_shared()) {} - DataTypePtr getReturnType() const override - { return std::make_shared(); } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, @@ -1494,12 +1491,9 @@ struct WindowFunctionDenseRank final : public WindowFunction { WindowFunctionDenseRank(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : WindowFunction(name_, argument_types_, parameters_) + : WindowFunction(name_, argument_types_, parameters_, std::make_shared()) {} - DataTypePtr getReturnType() const override - { return std::make_shared(); } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, @@ -1560,8 +1554,8 @@ template struct StatefulWindowFunction : public WindowFunction { StatefulWindowFunction(const std::string & name_, - const DataTypes & argument_types_, const Array & parameters_) - : WindowFunction(name_, argument_types_, parameters_) + const DataTypes & argument_types_, const Array & parameters_, const DataTypePtr & result_type_) + : WindowFunction(name_, argument_types_, parameters_, result_type_) { } @@ -1607,7 +1601,7 @@ struct WindowFunctionExponentialTimeDecayedSum final : public StatefulWindowFunc WindowFunctionExponentialTimeDecayedSum(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : StatefulWindowFunction(name_, argument_types_, parameters_) + : StatefulWindowFunction(name_, argument_types_, parameters_, std::make_shared()) { if (parameters_.size() != 1) { @@ -1639,11 +1633,6 @@ struct WindowFunctionExponentialTimeDecayedSum final : public StatefulWindowFunc } } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, @@ -1705,7 +1694,7 @@ struct WindowFunctionExponentialTimeDecayedMax final : public WindowFunction WindowFunctionExponentialTimeDecayedMax(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : WindowFunction(name_, argument_types_, parameters_) + : WindowFunction(name_, argument_types_, parameters_, std::make_shared()) { if (parameters_.size() != 1) { @@ -1737,11 +1726,6 @@ struct WindowFunctionExponentialTimeDecayedMax final : public WindowFunction } } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, @@ -1781,7 +1765,7 @@ struct WindowFunctionExponentialTimeDecayedCount final : public StatefulWindowFu WindowFunctionExponentialTimeDecayedCount(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : StatefulWindowFunction(name_, argument_types_, parameters_) + : StatefulWindowFunction(name_, argument_types_, parameters_, std::make_shared()) { if (parameters_.size() != 1) { @@ -1805,11 +1789,6 @@ struct WindowFunctionExponentialTimeDecayedCount final : public StatefulWindowFu } } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, @@ -1868,7 +1847,7 @@ struct WindowFunctionExponentialTimeDecayedAvg final : public StatefulWindowFunc WindowFunctionExponentialTimeDecayedAvg(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : StatefulWindowFunction(name_, argument_types_, parameters_) + : StatefulWindowFunction(name_, argument_types_, parameters_, std::make_shared()) { if (parameters_.size() != 1) { @@ -1900,11 +1879,6 @@ struct WindowFunctionExponentialTimeDecayedAvg final : public StatefulWindowFunc } } - DataTypePtr getReturnType() const override - { - return std::make_shared(); - } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, @@ -1980,12 +1954,9 @@ struct WindowFunctionRowNumber final : public WindowFunction { WindowFunctionRowNumber(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : WindowFunction(name_, argument_types_, parameters_) + : WindowFunction(name_, argument_types_, parameters_, std::make_shared()) {} - DataTypePtr getReturnType() const override - { return std::make_shared(); } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, @@ -2004,7 +1975,7 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction { WindowFunctionLagLeadInFrame(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : WindowFunction(name_, argument_types_, parameters_) + : WindowFunction(name_, argument_types_, parameters_, createResultType(argument_types_, name_)) { if (!parameters.empty()) { @@ -2012,12 +1983,6 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction "Function {} cannot be parameterized", name_); } - if (argument_types.empty()) - { - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "Function {} takes at least one argument", name_); - } - if (argument_types.size() == 1) { return; @@ -2060,7 +2025,16 @@ struct WindowFunctionLagLeadInFrame final : public WindowFunction } } - DataTypePtr getReturnType() const override { return argument_types[0]; } + static DataTypePtr createResultType(const DataTypes & argument_types_, const std::string & name_) + { + if (argument_types_.empty()) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Function {} takes at least one argument", name_); + } + + return argument_types_[0]; + } bool allocatesMemoryInArena() const override { return false; } @@ -2125,7 +2099,7 @@ struct WindowFunctionNthValue final : public WindowFunction { WindowFunctionNthValue(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : WindowFunction(name_, argument_types_, parameters_) + : WindowFunction(name_, argument_types_, parameters_, createResultType(name_, argument_types_)) { if (!parameters.empty()) { @@ -2133,12 +2107,6 @@ struct WindowFunctionNthValue final : public WindowFunction "Function {} cannot be parameterized", name_); } - if (argument_types.size() != 2) - { - throw Exception(ErrorCodes::BAD_ARGUMENTS, - "Function {} takes exactly two arguments", name_); - } - if (!isInt64OrUInt64FieldType(argument_types[1]->getDefault().getType())) { throw Exception(ErrorCodes::BAD_ARGUMENTS, @@ -2147,7 +2115,16 @@ struct WindowFunctionNthValue final : public WindowFunction } } - DataTypePtr getReturnType() const override { return argument_types[0]; } + static DataTypePtr createResultType(const std::string & name_, const DataTypes & argument_types_) + { + if (argument_types_.size() != 2) + { + throw Exception(ErrorCodes::BAD_ARGUMENTS, + "Function {} takes exactly two arguments", name_); + } + + return argument_types_[0]; + } bool allocatesMemoryInArena() const override { return false; } @@ -2204,7 +2181,7 @@ struct WindowFunctionNonNegativeDerivative final : public StatefulWindowFunction WindowFunctionNonNegativeDerivative(const std::string & name_, const DataTypes & argument_types_, const Array & parameters_) - : StatefulWindowFunction(name_, argument_types_, parameters_) + : StatefulWindowFunction(name_, argument_types_, parameters_, std::make_shared()) { if (!parameters.empty()) { @@ -2263,9 +2240,6 @@ struct WindowFunctionNonNegativeDerivative final : public StatefulWindowFunction } } - - DataTypePtr getReturnType() const override { return std::make_shared(); } - bool allocatesMemoryInArena() const override { return false; } void windowInsertResultInto(const WindowTransform * transform, diff --git a/src/Storages/MergeTree/KeyCondition.cpp b/src/Storages/MergeTree/KeyCondition.cpp index fcb87a6d4d9..5d0c3fc3cad 100644 --- a/src/Storages/MergeTree/KeyCondition.cpp +++ b/src/Storages/MergeTree/KeyCondition.cpp @@ -29,6 +29,8 @@ #include #include +#include + #include #include #include @@ -599,9 +601,9 @@ static const ActionsDAG::Node & cloneASTWithInversionPushDown( if (name == "indexHint") { ActionsDAG::NodeRawConstPtrs children; - if (const auto * adaptor = typeid_cast(node.function_builder.get())) + if (const auto * adaptor = typeid_cast(node.function_base.get())) { - if (const auto * index_hint = typeid_cast(adaptor->getFunction())) + if (const auto * index_hint = typeid_cast(adaptor->getFunction().get())) { const auto & index_hint_dag = index_hint->getActions(); children = index_hint_dag->getOutputs(); @@ -611,7 +613,7 @@ static const ActionsDAG::Node & cloneASTWithInversionPushDown( } } - const auto & func = inverted_dag.addFunction(node.function_builder, children, ""); + const auto & func = inverted_dag.addFunction(FunctionFactory::instance().get(node.function_base->getName(), context), children, ""); to_inverted[&node] = &func; return func; } @@ -654,7 +656,8 @@ static const ActionsDAG::Node & cloneASTWithInversionPushDown( return func; } - res = &inverted_dag.addFunction(node.function_builder, children, ""); + res = &inverted_dag.addFunction(node.function_base, children, ""); + chassert(res->result_type == node.result_type); } } @@ -939,12 +942,13 @@ static FieldRef applyFunction(const FunctionBasePtr & func, const DataTypePtr & * which while not strictly monotonic, are monotonic everywhere on the input range. */ bool KeyCondition::transformConstantWithValidFunctions( + ContextPtr context, const String & expr_name, size_t & out_key_column_num, DataTypePtr & out_key_column_type, Field & out_value, DataTypePtr & out_type, - std::function always_monotonic) const + std::function always_monotonic) const { const auto & sample_block = key_expr->getSampleBlock(); @@ -1024,14 +1028,16 @@ bool KeyCondition::transformConstantWithValidFunctions( auto left_arg_type = left->result_type; auto left_arg_value = (*left->column)[0]; std::tie(const_value, const_type) = applyBinaryFunctionForFieldOfUnknownType( - func->function_builder, left_arg_type, left_arg_value, const_type, const_value); + FunctionFactory::instance().get(func->function_base->getName(), context), + left_arg_type, left_arg_value, const_type, const_value); } else { auto right_arg_type = right->result_type; auto right_arg_value = (*right->column)[0]; std::tie(const_value, const_type) = applyBinaryFunctionForFieldOfUnknownType( - func->function_builder, const_type, const_value, right_arg_type, right_arg_value); + FunctionFactory::instance().get(func->function_base->getName(), context), + const_type, const_value, right_arg_type, right_arg_value); } } } @@ -1067,7 +1073,13 @@ bool KeyCondition::canConstantBeWrappedByMonotonicFunctions( return false; return transformConstantWithValidFunctions( - expr_name, out_key_column_num, out_key_column_type, out_value, out_type, [](IFunctionBase & func, const IDataType & type) + node.getTreeContext().getQueryContext(), + expr_name, + out_key_column_num, + out_key_column_type, + out_value, + out_type, + [](const IFunctionBase & func, const IDataType & type) { if (!func.hasInformationAboutMonotonicity()) return false; @@ -1116,7 +1128,13 @@ bool KeyCondition::canConstantBeWrappedByFunctions( return false; return transformConstantWithValidFunctions( - expr_name, out_key_column_num, out_key_column_type, out_value, out_type, [](IFunctionBase & func, const IDataType &) + node.getTreeContext().getQueryContext(), + expr_name, + out_key_column_num, + out_key_column_type, + out_value, + out_type, + [](const IFunctionBase & func, const IDataType &) { return func.isDeterministic(); }); diff --git a/src/Storages/MergeTree/KeyCondition.h b/src/Storages/MergeTree/KeyCondition.h index 258f88ac6b9..0a4ac93b082 100644 --- a/src/Storages/MergeTree/KeyCondition.h +++ b/src/Storages/MergeTree/KeyCondition.h @@ -19,7 +19,7 @@ namespace DB class ASTFunction; class Context; class IFunction; -using FunctionBasePtr = std::shared_ptr; +using FunctionBasePtr = std::shared_ptr; class ExpressionActions; using ExpressionActionsPtr = std::shared_ptr; struct ActionDAGNodes; @@ -421,12 +421,13 @@ private: std::vector & out_functions_chain); bool transformConstantWithValidFunctions( + ContextPtr context, const String & expr_name, size_t & out_key_column_num, DataTypePtr & out_key_column_type, Field & out_value, DataTypePtr & out_type, - std::function always_monotonic) const; + std::function always_monotonic) const; bool canConstantBeWrappedByMonotonicFunctions( const RPNBuilderTreeNode & node, diff --git a/src/Storages/MergeTree/MergeFromLogEntryTask.cpp b/src/Storages/MergeTree/MergeFromLogEntryTask.cpp index 9a9b8a4a6bb..d5627774052 100644 --- a/src/Storages/MergeTree/MergeFromLogEntryTask.cpp +++ b/src/Storages/MergeTree/MergeFromLogEntryTask.cpp @@ -297,9 +297,14 @@ bool MergeFromLogEntryTask::finalize(ReplicatedMergeMutateTaskBase::PartLogWrite { part = merge_task->getFuture().get(); - /// Task is not needed - merge_task.reset(); storage.merger_mutator.renameMergedTemporaryPart(part, parts, NO_TRANSACTION_PTR, *transaction_ptr); + /// Why we reset task here? Because it holds shared pointer to part and tryRemovePartImmediately will + /// not able to remove the part and will throw an exception (because someone holds the pointer). + /// + /// Why we cannot reset task right after obtaining part from getFuture()? Because it holds RAII wrapper for + /// temp directories which guards temporary dir from background removal. So it's right place to reset the task + /// and it's really needed. + merge_task.reset(); try { diff --git a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp index 234487763d7..f1c1a96d24f 100644 --- a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp +++ b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.cpp @@ -41,7 +41,7 @@ void MergeTreeBackgroundExecutor::increaseThreadsAndMaxTasksCount(size_t return; } - if (new_max_tasks_count < max_tasks_count) + if (new_max_tasks_count < max_tasks_count.load(std::memory_order_relaxed)) { LOG_WARNING(log, "Loaded new max tasks count for {}Executor from top level config, but new value ({}) is not greater than current {}", name, new_max_tasks_count, max_tasks_count); return; @@ -59,15 +59,14 @@ void MergeTreeBackgroundExecutor::increaseThreadsAndMaxTasksCount(size_t for (size_t number = threads_count; number < new_threads_count; ++number) pool.scheduleOrThrowOnError([this] { threadFunction(); }); - max_tasks_count = new_max_tasks_count; + max_tasks_count.store(new_max_tasks_count, std::memory_order_relaxed); threads_count = new_threads_count; } template size_t MergeTreeBackgroundExecutor::getMaxTasksCount() const { - std::lock_guard lock(mutex); - return max_tasks_count; + return max_tasks_count.load(std::memory_order_relaxed); } template diff --git a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h index 0fc888dd6ad..ad50cd44189 100644 --- a/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h +++ b/src/Storages/MergeTree/MergeTreeBackgroundExecutor.h @@ -194,6 +194,10 @@ public: /// Supports only increasing the number of threads and tasks, because /// implementing tasks eviction will definitely be too error-prone and buggy. void increaseThreadsAndMaxTasksCount(size_t new_threads_count, size_t new_max_tasks_count); + + /// This method can return stale value of max_tasks_count (no mutex locking). + /// It's okay because amount of tasks can be only increased and getting stale value + /// can lead only to some postponing, not logical error. size_t getMaxTasksCount() const; bool trySchedule(ExecutableTaskPtr task); @@ -203,7 +207,7 @@ public: private: String name; size_t threads_count TSA_GUARDED_BY(mutex) = 0; - size_t max_tasks_count TSA_GUARDED_BY(mutex) = 0; + std::atomic max_tasks_count = 0; CurrentMetrics::Metric metric; void routine(TaskRuntimeDataPtr item); diff --git a/src/Storages/MergeTree/MergeTreeData.cpp b/src/Storages/MergeTree/MergeTreeData.cpp index 4dfa8b9a801..78c9e43be43 100644 --- a/src/Storages/MergeTree/MergeTreeData.cpp +++ b/src/Storages/MergeTree/MergeTreeData.cpp @@ -1348,6 +1348,8 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) loadDataPartsFromDisk( broken_parts_to_detach, duplicate_parts_to_remove, pool, num_parts, parts_queue, skip_sanity_checks, settings); + bool is_static_storage = isStaticStorage(); + if (settings->in_memory_parts_enable_wal) { std::map disk_wal_part_map; @@ -1376,13 +1378,13 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) ErrorCodes::CORRUPTED_DATA); write_ahead_log = std::make_shared(*this, disk_ptr, it->name()); - for (auto && part : write_ahead_log->restore(metadata_snapshot, getContext(), part_lock)) + for (auto && part : write_ahead_log->restore(metadata_snapshot, getContext(), part_lock, is_static_storage)) disk_wal_parts.push_back(std::move(part)); } else { MergeTreeWriteAheadLog wal(*this, disk_ptr, it->name()); - for (auto && part : wal.restore(metadata_snapshot, getContext(), part_lock)) + for (auto && part : wal.restore(metadata_snapshot, getContext(), part_lock, is_static_storage)) disk_wal_parts.push_back(std::move(part)); } } @@ -1408,11 +1410,17 @@ void MergeTreeData::loadDataParts(bool skip_sanity_checks) return; } - for (auto & part : broken_parts_to_detach) - part->renameToDetached("broken-on-start"); /// detached parts must not have '_' in prefixes + if (!is_static_storage) + { + for (auto & part : broken_parts_to_detach) + { + /// detached parts must not have '_' in prefixes + part->renameToDetached("broken-on-start"); + } - for (auto & part : duplicate_parts_to_remove) - part->remove(); + for (auto & part : duplicate_parts_to_remove) + part->remove(); + } auto deactivate_part = [&] (DataPartIteratorByStateAndInfo it) { @@ -2167,6 +2175,8 @@ size_t MergeTreeData::clearEmptyParts() void MergeTreeData::rename(const String & new_table_path, const StorageID & new_table_id) { + LOG_INFO(log, "Renaming table to path {} with ID {}", new_table_path, new_table_id.getFullTableName()); + auto disks = getStoragePolicy()->getDisks(); for (const auto & disk : disks) @@ -5661,7 +5671,7 @@ Block MergeTreeData::getMinMaxCountProjectionBlock( agg_count->set(place, value.get()); else { - auto value_column = func->getReturnType()->createColumnConst(1, value)->convertToFullColumnIfConst(); + auto value_column = func->getResultType()->createColumnConst(1, value)->convertToFullColumnIfConst(); const auto * value_column_ptr = value_column.get(); func->add(place, &value_column_ptr, 0, &arena); } diff --git a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index 79670c0ab27..2cd2e5a454b 100644 --- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -65,8 +65,8 @@ static const double DISK_USAGE_COEFFICIENT_TO_SELECT = 2; /// because between selecting parts to merge and doing merge, amount of free space could have decreased. static const double DISK_USAGE_COEFFICIENT_TO_RESERVE = 1.1; -MergeTreeDataMergerMutator::MergeTreeDataMergerMutator(MergeTreeData & data_, size_t max_tasks_count_) - : data(data_), max_tasks_count(max_tasks_count_), log(&Poco::Logger::get(data.getLogName() + " (MergerMutator)")) +MergeTreeDataMergerMutator::MergeTreeDataMergerMutator(MergeTreeData & data_) + : data(data_), log(&Poco::Logger::get(data.getLogName() + " (MergerMutator)")) { } @@ -75,6 +75,7 @@ UInt64 MergeTreeDataMergerMutator::getMaxSourcePartsSizeForMerge() const { size_t scheduled_tasks_count = CurrentMetrics::values[CurrentMetrics::BackgroundMergesAndMutationsPoolTask].load(std::memory_order_relaxed); + auto max_tasks_count = data.getContext()->getMergeMutateExecutor()->getMaxTasksCount(); return getMaxSourcePartsSizeForMerge(max_tasks_count, scheduled_tasks_count); } @@ -114,7 +115,7 @@ UInt64 MergeTreeDataMergerMutator::getMaxSourcePartSizeForMutation() const /// DataPart can be store only at one disk. Get maximum reservable free space at all disks. UInt64 disk_space = data.getStoragePolicy()->getMaxUnreservedFreeSpace(); - + auto max_tasks_count = data.getContext()->getMergeMutateExecutor()->getMaxTasksCount(); /// Allow mutations only if there are enough threads, leave free threads for merges else if (occupied <= 1 || max_tasks_count - occupied >= data_settings->number_of_free_entries_in_pool_to_execute_mutation) diff --git a/src/Storages/MergeTree/MergeTreeDataMergerMutator.h b/src/Storages/MergeTree/MergeTreeDataMergerMutator.h index 5d98f526325..f81846dadc3 100644 --- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.h +++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.h @@ -45,7 +45,7 @@ public: const MergeTreeTransaction *, String *)>; - MergeTreeDataMergerMutator(MergeTreeData & data_, size_t max_tasks_count_); + explicit MergeTreeDataMergerMutator(MergeTreeData & data_); /** Get maximum total size of parts to do merge, at current moment of time. * It depends on number of free threads in background_pool and amount of free space in disk. @@ -155,7 +155,6 @@ public : private: MergeTreeData & data; - const size_t max_tasks_count; Poco::Logger * log; diff --git a/src/Storages/MergeTree/MergeTreeIndexSet.cpp b/src/Storages/MergeTree/MergeTreeIndexSet.cpp index a28394e943e..db99a2f37be 100644 --- a/src/Storages/MergeTree/MergeTreeIndexSet.cpp +++ b/src/Storages/MergeTree/MergeTreeIndexSet.cpp @@ -9,6 +9,10 @@ #include #include +#include +#include + +#include namespace DB { @@ -242,67 +246,78 @@ MergeTreeIndexGranulePtr MergeTreeIndexAggregatorSet::getGranuleAndReset() MergeTreeIndexConditionSet::MergeTreeIndexConditionSet( const String & index_name_, - const Block & index_sample_block_, + const Block & index_sample_block, size_t max_rows_, - const SelectQueryInfo & query, + const SelectQueryInfo & query_info, ContextPtr context) : index_name(index_name_) , max_rows(max_rows_) - , index_sample_block(index_sample_block_) { for (const auto & name : index_sample_block.getNames()) if (!key_columns.contains(name)) key_columns.insert(name); - const auto & select = query.query->as(); - - if (select.where() && select.prewhere()) - expression_ast = makeASTFunction( - "and", - select.where()->clone(), - select.prewhere()->clone()); - else if (select.where()) - expression_ast = select.where()->clone(); - else if (select.prewhere()) - expression_ast = select.prewhere()->clone(); - - useless = checkASTUseless(expression_ast); - /// Do not proceed if index is useless for this query. - if (useless) + ASTPtr ast_filter_node = buildFilterNode(query_info.query); + if (!ast_filter_node) return; - /// Replace logical functions with bit functions. - /// Working with UInt8: last bit = can be true, previous = can be false (Like src/Storages/MergeTree/BoolMask.h). - traverseAST(expression_ast); + if (context->getSettingsRef().allow_experimental_analyzer) + { + if (!query_info.filter_actions_dag) + return; - auto syntax_analyzer_result = TreeRewriter(context).analyze( - expression_ast, index_sample_block.getNamesAndTypesList()); - actions = ExpressionAnalyzer(expression_ast, syntax_analyzer_result, context).getActions(true); + if (checkDAGUseless(*query_info.filter_actions_dag->getOutputs().at(0), context)) + return; + + const auto * filter_node = query_info.filter_actions_dag->getOutputs().at(0); + auto filter_actions_dag = ActionsDAG::buildFilterActionsDAG({filter_node}, {}, context); + const auto * filter_actions_dag_node = filter_actions_dag->getOutputs().at(0); + + std::unordered_map node_to_result_node; + filter_actions_dag->getOutputs()[0] = &traverseDAG(*filter_actions_dag_node, filter_actions_dag, context, node_to_result_node); + + filter_actions_dag->removeUnusedActions(); + actions = std::make_shared(filter_actions_dag); + } + else + { + if (checkASTUseless(ast_filter_node)) + return; + + auto expression_ast = ast_filter_node->clone(); + + /// Replace logical functions with bit functions. + /// Working with UInt8: last bit = can be true, previous = can be false (Like src/Storages/MergeTree/BoolMask.h). + traverseAST(expression_ast); + + auto syntax_analyzer_result = TreeRewriter(context).analyze(expression_ast, index_sample_block.getNamesAndTypesList()); + actions = ExpressionAnalyzer(expression_ast, syntax_analyzer_result, context).getActions(true); + } } bool MergeTreeIndexConditionSet::alwaysUnknownOrTrue() const { - return useless; + return isUseless(); } bool MergeTreeIndexConditionSet::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx_granule) const { - if (useless) + if (isUseless()) return true; auto granule = std::dynamic_pointer_cast(idx_granule); if (!granule) - throw Exception( - "Set index condition got a granule with the wrong type.", ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, + "Set index condition got a granule with the wrong type"); - if (useless || granule->empty() || (max_rows != 0 && granule->size() > max_rows)) + if (isUseless() || granule->empty() || (max_rows != 0 && granule->size() > max_rows)) return true; Block result = granule->block; actions->execute(result); - auto column - = result.getByName(expression_ast->getColumnName()).column->convertToFullColumnIfConst()->convertToFullColumnIfLowCardinality(); + const auto & filter_node_name = actions->getActionsDAG().getOutputs().at(0)->result_name; + auto column = result.getByName(filter_node_name).column->convertToFullColumnIfConst()->convertToFullColumnIfLowCardinality(); if (column->onlyNull()) return false; @@ -318,17 +333,214 @@ bool MergeTreeIndexConditionSet::mayBeTrueOnGranule(MergeTreeIndexGranulePtr idx } if (!col_uint8) - throw Exception("ColumnUInt8 expected as Set index condition result.", ErrorCodes::LOGICAL_ERROR); + throw Exception(ErrorCodes::LOGICAL_ERROR, + "ColumnUInt8 expected as Set index condition result"); const auto & condition = col_uint8->getData(); + size_t column_size = column->size(); - for (size_t i = 0; i < column->size(); ++i) + for (size_t i = 0; i < column_size; ++i) if ((!null_map || (*null_map)[i] == 0) && condition[i] & 1) return true; return false; } + +const ActionsDAG::Node & MergeTreeIndexConditionSet::traverseDAG(const ActionsDAG::Node & node, + ActionsDAGPtr & result_dag, + const ContextPtr & context, + std::unordered_map & node_to_result_node) const +{ + auto result_node_it = node_to_result_node.find(&node); + if (result_node_it != node_to_result_node.end()) + return *result_node_it->second; + + const ActionsDAG::Node * result_node = nullptr; + + if (const auto * operator_node_ptr = operatorFromDAG(node, result_dag, context, node_to_result_node)) + { + result_node = operator_node_ptr; + } + else if (const auto * atom_node_ptr = atomFromDAG(node, result_dag, context)) + { + result_node = atom_node_ptr; + + if (atom_node_ptr->type == ActionsDAG::ActionType::INPUT || + atom_node_ptr->type == ActionsDAG::ActionType::FUNCTION) + { + auto bit_wrapper_function = FunctionFactory::instance().get("__bitWrapperFunc", context); + result_node = &result_dag->addFunction(bit_wrapper_function, {atom_node_ptr}, {}); + } + } + else + { + ColumnWithTypeAndName unknown_field_column_with_type; + + unknown_field_column_with_type.name = calculateConstantActionNodeName(UNKNOWN_FIELD); + unknown_field_column_with_type.type = std::make_shared(); + unknown_field_column_with_type.column = unknown_field_column_with_type.type->createColumnConst(1, UNKNOWN_FIELD); + + result_node = &result_dag->addColumn(unknown_field_column_with_type); + } + + node_to_result_node.emplace(&node, result_node); + return *result_node; +} + +const ActionsDAG::Node * MergeTreeIndexConditionSet::atomFromDAG(const ActionsDAG::Node & node, ActionsDAGPtr & result_dag, const ContextPtr & context) const +{ + /// Function, literal or column + + const auto * node_to_check = &node; + while (node_to_check->type == ActionsDAG::ActionType::ALIAS) + node_to_check = node_to_check->children[0]; + + if (node_to_check->column && isColumnConst(*node_to_check->column)) + return &node; + + RPNBuilderTreeContext tree_context(context); + RPNBuilderTreeNode tree_node(node_to_check, tree_context); + + auto column_name = tree_node.getColumnName(); + if (key_columns.contains(column_name)) + { + const auto * result_node = node_to_check; + + if (node.type != ActionsDAG::ActionType::INPUT) + result_node = &result_dag->addInput(column_name, node.result_type); + + return result_node; + } + + if (node.type != ActionsDAG::ActionType::FUNCTION) + return nullptr; + + const auto & arguments = node.children; + size_t arguments_size = arguments.size(); + + ActionsDAG::NodeRawConstPtrs children(arguments_size); + + for (size_t i = 0; i < arguments_size; ++i) + { + children[i] = atomFromDAG(*arguments[i], result_dag, context); + + if (!children[i]) + return nullptr; + } + + return &result_dag->addFunction(node.function_base, children, {}); +} + +const ActionsDAG::Node * MergeTreeIndexConditionSet::operatorFromDAG(const ActionsDAG::Node & node, + ActionsDAGPtr & result_dag, + const ContextPtr & context, + std::unordered_map & node_to_result_node) const +{ + /// Functions AND, OR, NOT. Replace with bit*. + + const auto * node_to_check = &node; + while (node_to_check->type == ActionsDAG::ActionType::ALIAS) + node_to_check = node_to_check->children[0]; + + if (node_to_check->column && isColumnConst(*node_to_check->column)) + return nullptr; + + if (node_to_check->type != ActionsDAG::ActionType::FUNCTION) + return nullptr; + + auto function_name = node_to_check->function->getName(); + const auto & arguments = node_to_check->children; + size_t arguments_size = arguments.size(); + + if (function_name == "not") + { + if (arguments_size != 1) + return nullptr; + + auto bit_swap_last_two_function = FunctionFactory::instance().get("__bitSwapLastTwo", context); + return &result_dag->addFunction(bit_swap_last_two_function, {arguments[0]}, {}); + } + else if (function_name == "and" || function_name == "indexHint" || function_name == "or") + { + if (arguments_size < 2) + return nullptr; + + ActionsDAG::NodeRawConstPtrs children; + children.resize(arguments_size); + + for (size_t i = 0; i < arguments_size; ++i) + children[i] = &traverseDAG(*arguments[i], result_dag, context, node_to_result_node); + + FunctionOverloadResolverPtr function; + + if (function_name == "and" || function_name == "indexHint") + function = FunctionFactory::instance().get("__bitBoolMaskAnd", context); + else + function = FunctionFactory::instance().get("__bitBoolMaskOr", context); + + const auto * last_argument = children.back(); + children.pop_back(); + + const auto * before_last_argument = children.back(); + children.pop_back(); + + while (true) + { + last_argument = &result_dag->addFunction(function, {before_last_argument, last_argument}, {}); + + if (children.empty()) + break; + + before_last_argument = children.back(); + children.pop_back(); + } + + return last_argument; + } + + return nullptr; +} + +bool MergeTreeIndexConditionSet::checkDAGUseless(const ActionsDAG::Node & node, const ContextPtr & context, bool atomic) const +{ + const auto * node_to_check = &node; + while (node_to_check->type == ActionsDAG::ActionType::ALIAS) + node_to_check = node_to_check->children[0]; + + RPNBuilderTreeContext tree_context(context); + RPNBuilderTreeNode tree_node(node_to_check, tree_context); + + if (node.column && isColumnConst(*node.column)) + { + Field literal; + node.column->get(0, literal); + return !atomic && literal.safeGet(); + } + else if (node.type == ActionsDAG::ActionType::FUNCTION) + { + auto column_name = tree_node.getColumnName(); + if (key_columns.contains(column_name)) + return false; + + auto function_name = node.function_base->getName(); + const auto & arguments = node.children; + + if (function_name == "and" || function_name == "indexHint") + return std::all_of(arguments.begin(), arguments.end(), [&, atomic](const auto & arg) { return checkDAGUseless(*arg, context, atomic); }); + else if (function_name == "or") + return std::any_of(arguments.begin(), arguments.end(), [&, atomic](const auto & arg) { return checkDAGUseless(*arg, context, atomic); }); + else if (function_name == "not") + return checkDAGUseless(*arguments.at(0), context, atomic); + else + return std::any_of(arguments.begin(), arguments.end(), + [&](const auto & arg) { return checkDAGUseless(*arg, context, true /*atomic*/); }); + } + + auto column_name = tree_node.getColumnName(); + return !key_columns.contains(column_name); +} + void MergeTreeIndexConditionSet::traverseAST(ASTPtr & node) const { if (operatorFromAST(node)) @@ -465,7 +677,7 @@ bool MergeTreeIndexConditionSet::checkASTUseless(const ASTPtr & node, bool atomi else if (const auto * literal = node->as()) return !atomic && literal->value.safeGet(); else if (const auto * identifier = node->as()) - return key_columns.find(identifier->getColumnName()) == std::end(key_columns); + return !key_columns.contains(identifier->getColumnName()); else return true; } diff --git a/src/Storages/MergeTree/MergeTreeIndexSet.h b/src/Storages/MergeTree/MergeTreeIndexSet.h index 23b336d274b..e23fddc0f28 100644 --- a/src/Storages/MergeTree/MergeTreeIndexSet.h +++ b/src/Storages/MergeTree/MergeTreeIndexSet.h @@ -84,9 +84,9 @@ class MergeTreeIndexConditionSet final : public IMergeTreeIndexCondition public: MergeTreeIndexConditionSet( const String & index_name_, - const Block & index_sample_block_, + const Block & index_sample_block, size_t max_rows_, - const SelectQueryInfo & query, + const SelectQueryInfo & query_info, ContextPtr context); bool alwaysUnknownOrTrue() const override; @@ -95,20 +95,39 @@ public: ~MergeTreeIndexConditionSet() override = default; private: + const ActionsDAG::Node & traverseDAG(const ActionsDAG::Node & node, + ActionsDAGPtr & result_dag, + const ContextPtr & context, + std::unordered_map & node_to_result_node) const; + + const ActionsDAG::Node * atomFromDAG(const ActionsDAG::Node & node, + ActionsDAGPtr & result_dag, + const ContextPtr & context) const; + + const ActionsDAG::Node * operatorFromDAG(const ActionsDAG::Node & node, + ActionsDAGPtr & result_dag, + const ContextPtr & context, + std::unordered_map & node_to_result_node) const; + + bool checkDAGUseless(const ActionsDAG::Node & node, const ContextPtr & context, bool atomic = false) const; + void traverseAST(ASTPtr & node) const; + bool atomFromAST(ASTPtr & node) const; + static bool operatorFromAST(ASTPtr & node); bool checkASTUseless(const ASTPtr & node, bool atomic = false) const; - String index_name; size_t max_rows; - Block index_sample_block; - bool useless; - std::set key_columns; - ASTPtr expression_ast; + bool isUseless() const + { + return actions == nullptr; + } + + std::unordered_set key_columns; ExpressionActionsPtr actions; }; diff --git a/src/Storages/MergeTree/MergeTreeWriteAheadLog.cpp b/src/Storages/MergeTree/MergeTreeWriteAheadLog.cpp index b3625ba8e93..5b916096e06 100644 --- a/src/Storages/MergeTree/MergeTreeWriteAheadLog.cpp +++ b/src/Storages/MergeTree/MergeTreeWriteAheadLog.cpp @@ -138,7 +138,8 @@ void MergeTreeWriteAheadLog::rotate(const std::unique_lock &) MergeTreeData::MutableDataPartsVector MergeTreeWriteAheadLog::restore( const StorageMetadataPtr & metadata_snapshot, ContextPtr context, - std::unique_lock & parts_lock) + std::unique_lock & parts_lock, + bool readonly) { std::unique_lock lock(write_mutex); @@ -207,7 +208,10 @@ MergeTreeData::MutableDataPartsVector MergeTreeWriteAheadLog::restore( /// If file is broken, do not write new parts to it. /// But if it contains any part rotate and save them. if (max_block_number == -1) - disk->removeFile(path); + { + if (!readonly) + disk->removeFile(path); + } else if (name == DEFAULT_WAL_FILE_NAME) rotate(lock); @@ -256,7 +260,7 @@ MergeTreeData::MutableDataPartsVector MergeTreeWriteAheadLog::restore( [&dropped_parts](const auto & part) { return dropped_parts.count(part->name) == 0; }); /// All parts in WAL had been already committed into the disk -> clear the WAL - if (result.empty()) + if (!readonly && result.empty()) { LOG_DEBUG(log, "WAL file '{}' had been completely processed. Removing.", path); disk->removeFile(path); diff --git a/src/Storages/MergeTree/MergeTreeWriteAheadLog.h b/src/Storages/MergeTree/MergeTreeWriteAheadLog.h index a03fe09e03d..eba7698b9f9 100644 --- a/src/Storages/MergeTree/MergeTreeWriteAheadLog.h +++ b/src/Storages/MergeTree/MergeTreeWriteAheadLog.h @@ -65,7 +65,8 @@ public: std::vector restore( const StorageMetadataPtr & metadata_snapshot, ContextPtr context, - std::unique_lock & parts_lock); + std::unique_lock & parts_lock, + bool readonly); using MinMaxBlockNumber = std::pair; static std::optional tryParseMinMaxBlockNumber(const String & filename); diff --git a/src/Storages/MergeTree/MutateTask.cpp b/src/Storages/MergeTree/MutateTask.cpp index 2b186795723..0c1cc6e4b84 100644 --- a/src/Storages/MergeTree/MutateTask.cpp +++ b/src/Storages/MergeTree/MutateTask.cpp @@ -1514,8 +1514,14 @@ bool MutateTask::prepare() ctx->num_mutations = std::make_unique(CurrentMetrics::PartMutation); auto context_for_reading = Context::createCopy(ctx->context); + + /// We must read with one thread because it guarantees that output stream will be sorted. + /// Disable all settings that can enable reading with several streams. context_for_reading->setSetting("max_streams_to_max_threads_ratio", 1); context_for_reading->setSetting("max_threads", 1); + context_for_reading->setSetting("allow_asynchronous_read_from_io_pool_for_merge_tree", false); + context_for_reading->setSetting("max_streams_for_merge_tree_reading", Field(0)); + /// Allow mutations to work when force_index_by_date or force_primary_key is on. context_for_reading->setSetting("force_index_by_date", false); context_for_reading->setSetting("force_primary_key", false); diff --git a/src/Storages/NamedCollectionsHelpers.cpp b/src/Storages/NamedCollectionsHelpers.cpp index bd992a75d94..af1783d47eb 100644 --- a/src/Storages/NamedCollectionsHelpers.cpp +++ b/src/Storages/NamedCollectionsHelpers.cpp @@ -25,7 +25,7 @@ namespace return nullptr; const auto & collection_name = identifier->name(); - return NamedCollectionFactory::instance().tryGet(collection_name); + return NamedCollectionFactory::instance().get(collection_name); } std::optional> getKeyValueFromAST(ASTPtr ast) diff --git a/src/Storages/StorageMergeTree.cpp b/src/Storages/StorageMergeTree.cpp index 5eb30f404c1..639fe2affe1 100644 --- a/src/Storages/StorageMergeTree.cpp +++ b/src/Storages/StorageMergeTree.cpp @@ -101,7 +101,7 @@ StorageMergeTree::StorageMergeTree( attach) , reader(*this) , writer(*this) - , merger_mutator(*this, getContext()->getMergeMutateExecutor()->getMaxTasksCount()) + , merger_mutator(*this) { loadDataParts(has_force_restore_data_flag); @@ -1105,7 +1105,7 @@ bool StorageMergeTree::scheduleDataProcessingJob(BackgroundJobsAssignee & assign auto metadata_snapshot = getInMemoryMetadataPtr(); MergeMutateSelectedEntryPtr merge_entry, mutate_entry; - auto share_lock = lockForShare(RWLockImpl::NO_QUERY, getSettings()->lock_acquire_timeout_for_background_operations); + auto shared_lock = lockForShare(RWLockImpl::NO_QUERY, getSettings()->lock_acquire_timeout_for_background_operations); MergeTreeTransactionHolder transaction_for_merge; MergeTreeTransactionPtr txn; @@ -1122,17 +1122,17 @@ bool StorageMergeTree::scheduleDataProcessingJob(BackgroundJobsAssignee & assign if (merger_mutator.merges_blocker.isCancelled()) return false; - merge_entry = selectPartsToMerge(metadata_snapshot, false, {}, false, nullptr, share_lock, lock, txn); + merge_entry = selectPartsToMerge(metadata_snapshot, false, {}, false, nullptr, shared_lock, lock, txn); if (!merge_entry && !current_mutations_by_version.empty()) - mutate_entry = selectPartsToMutate(metadata_snapshot, nullptr, share_lock, lock); + mutate_entry = selectPartsToMutate(metadata_snapshot, nullptr, shared_lock, lock); has_mutations = !current_mutations_by_version.empty(); } if (merge_entry) { - auto task = std::make_shared(*this, metadata_snapshot, false, Names{}, merge_entry, share_lock, common_assignee_trigger); + auto task = std::make_shared(*this, metadata_snapshot, false, Names{}, merge_entry, shared_lock, common_assignee_trigger); task->setCurrentTransaction(std::move(transaction_for_merge), std::move(txn)); bool scheduled = assignee.scheduleMergeMutateTask(task); /// The problem that we already booked a slot for TTL merge, but a merge list entry will be created only in a prepare method @@ -1143,7 +1143,7 @@ bool StorageMergeTree::scheduleDataProcessingJob(BackgroundJobsAssignee & assign } if (mutate_entry) { - auto task = std::make_shared(*this, metadata_snapshot, mutate_entry, share_lock, common_assignee_trigger); + auto task = std::make_shared(*this, metadata_snapshot, mutate_entry, shared_lock, common_assignee_trigger); assignee.scheduleMergeMutateTask(task); return true; } @@ -1160,7 +1160,7 @@ bool StorageMergeTree::scheduleDataProcessingJob(BackgroundJobsAssignee & assign getSettings()->merge_tree_clear_old_temporary_directories_interval_seconds)) { assignee.scheduleCommonTask(std::make_shared( - [this, share_lock] () + [this, shared_lock] () { return clearOldTemporaryDirectories(getSettings()->temporary_directories_lifetime.totalSeconds()); }, common_assignee_trigger, getStorageID()), /* need_trigger */ false); @@ -1171,7 +1171,7 @@ bool StorageMergeTree::scheduleDataProcessingJob(BackgroundJobsAssignee & assign getSettings()->merge_tree_clear_old_parts_interval_seconds)) { assignee.scheduleCommonTask(std::make_shared( - [this, share_lock] () + [this, shared_lock] () { /// All use relative_data_path which changes during rename /// so execute under share lock. diff --git a/src/Storages/StorageReplicatedMergeTree.cpp b/src/Storages/StorageReplicatedMergeTree.cpp index 99ceb1d90ae..b03358ba15b 100644 --- a/src/Storages/StorageReplicatedMergeTree.cpp +++ b/src/Storages/StorageReplicatedMergeTree.cpp @@ -273,7 +273,7 @@ StorageReplicatedMergeTree::StorageReplicatedMergeTree( , replica_path(fs::path(zookeeper_path) / "replicas" / replica_name_) , reader(*this) , writer(*this) - , merger_mutator(*this, getContext()->getMergeMutateExecutor()->getMaxTasksCount()) + , merger_mutator(*this) , merge_strategy_picker(*this) , queue(*this, merge_strategy_picker) , fetcher(*this) diff --git a/src/Storages/TTLDescription.cpp b/src/Storages/TTLDescription.cpp index 41c9c1996b1..2971d977099 100644 --- a/src/Storages/TTLDescription.cpp +++ b/src/Storages/TTLDescription.cpp @@ -61,7 +61,7 @@ void checkTTLExpression(const ExpressionActionsPtr & ttl_expression, const Strin { if (action.node->type == ActionsDAG::ActionType::FUNCTION) { - IFunctionBase & func = *action.node->function_base; + const IFunctionBase & func = *action.node->function_base; if (!func.isDeterministic()) throw Exception( "TTL expression cannot contain non-deterministic functions, " diff --git a/tests/clickhouse-test b/tests/clickhouse-test index 20e63412d91..6e912c7ca10 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -456,6 +456,7 @@ class SettingsRandomizer: "merge_tree_coarse_index_granularity": lambda: random.randint(2, 32), "optimize_distinct_in_order": lambda: random.randint(0, 1), "optimize_sorting_by_input_stream_properties": lambda: random.randint(0, 1), + "enable_memory_bound_merging_of_aggregation_results": lambda: random.randint(0, 1), } @staticmethod diff --git a/tests/config/config.d/storage_conf.xml b/tests/config/config.d/storage_conf.xml index 8226d801cef..bc9269e6ec1 100644 --- a/tests/config/config.d/storage_conf.xml +++ b/tests/config/config.d/storage_conf.xml @@ -100,7 +100,7 @@ 22548578304 0 1 - 100 + 100 cache @@ -109,6 +109,15 @@ 1000 1 + + cache + s3_disk_6 + s3_cache_small_segment_size/ + 22548578304 + 10Ki + 0 + 1 + local @@ -234,6 +243,13 @@ + + +
+ s3_cache_small_segment_size +
+
+
diff --git a/tests/integration/test_mutations_in_partitions_of_merge_tree/configs/max_threads.xml b/tests/integration/test_mutations_in_partitions_of_merge_tree/configs/max_threads.xml new file mode 100644 index 00000000000..824e867c5c7 --- /dev/null +++ b/tests/integration/test_mutations_in_partitions_of_merge_tree/configs/max_threads.xml @@ -0,0 +1,8 @@ + + + + 1 + 64 + + + diff --git a/tests/integration/test_mutations_in_partitions_of_merge_tree/test.py b/tests/integration/test_mutations_in_partitions_of_merge_tree/test.py index 2ab5816e5b1..17f2a27e49b 100644 --- a/tests/integration/test_mutations_in_partitions_of_merge_tree/test.py +++ b/tests/integration/test_mutations_in_partitions_of_merge_tree/test.py @@ -19,6 +19,14 @@ node2 = cluster.add_instance( stay_alive=True, ) +node3 = cluster.add_instance( + "node3", + main_configs=["configs/logs_config.xml", "configs/cluster.xml"], + user_configs=["configs/max_threads.xml"], + with_zookeeper=True, + stay_alive=True, +) + @pytest.fixture(scope="module") def started_cluster(): @@ -180,3 +188,20 @@ def test_trivial_alter_in_partition_replicated_merge_tree(started_cluster): finally: node1.query("DROP TABLE IF EXISTS {}".format(name)) node2.query("DROP TABLE IF EXISTS {}".format(name)) + + +def test_mutation_max_streams(started_cluster): + try: + node3.query("DROP TABLE IF EXISTS t_mutations") + + node3.query("CREATE TABLE t_mutations (a UInt32) ENGINE = MergeTree ORDER BY a") + node3.query("INSERT INTO t_mutations SELECT number FROM numbers(10000000)") + + node3.query( + "ALTER TABLE t_mutations DELETE WHERE a = 300000", + settings={"mutations_sync": "2"}, + ) + + assert node3.query("SELECT count() FROM t_mutations") == "9999999\n" + finally: + node3.query("DROP TABLE IF EXISTS t_mutations") diff --git a/tests/integration/test_server_reload/test.py b/tests/integration/test_server_reload/test.py index 1323285b17f..b06d424ee1c 100644 --- a/tests/integration/test_server_reload/test.py +++ b/tests/integration/test_server_reload/test.py @@ -33,6 +33,9 @@ instance = cluster.add_instance( ], user_configs=["configs/default_passwd.xml"], with_zookeeper=True, + # Bug in TSAN reproduces in this test https://github.com/grpc/grpc/issues/29550#issuecomment-1188085387 + # second_deadlock_stack -- just ordinary option we use everywhere, don't want to overwrite it + env_variables={"TSAN_OPTIONS": "report_atomic_races=0 second_deadlock_stack=1"}, ) diff --git a/tests/integration/test_storage_s3/test.py b/tests/integration/test_storage_s3/test.py index 937f14bb878..2fa499eb78e 100644 --- a/tests/integration/test_storage_s3/test.py +++ b/tests/integration/test_storage_s3/test.py @@ -1013,6 +1013,9 @@ def test_predefined_connection_configuration(started_cluster): ) assert result == instance.query("SELECT number FROM numbers(10)") + result = instance.query_and_get_error("SELECT * FROM s3(no_collection)") + assert "There is no named collection `no_collection`" in result + result = "" diff --git a/tests/queries/0_stateless/00718_format_datetime.reference b/tests/queries/0_stateless/00718_format_datetime.reference index bc98dd59d5f..17937514396 100644 --- a/tests/queries/0_stateless/00718_format_datetime.reference +++ b/tests/queries/0_stateless/00718_format_datetime.reference @@ -34,3 +34,11 @@ no formatting pattern no formatting pattern -1100 +0300 +0530 +1234560 +000340 +2022-12-08 18:11:29.123400000 +2022-12-08 18:11:29.1 +2022-12-08 18:11:29.0 +2022-12-08 18:11:29.0 +2022-12-08 00:00:00.0 +2022-12-08 00:00:00.0 diff --git a/tests/queries/0_stateless/00718_format_datetime.sql b/tests/queries/0_stateless/00718_format_datetime.sql index deb5fb96c6c..f6fb2ce15bc 100644 --- a/tests/queries/0_stateless/00718_format_datetime.sql +++ b/tests/queries/0_stateless/00718_format_datetime.sql @@ -54,3 +54,13 @@ SELECT formatDateTime(toDateTime('2020-01-01 01:00:00', 'UTC'), '%z'); SELECT formatDateTime(toDateTime('2020-01-01 01:00:00', 'US/Samoa'), '%z'); SELECT formatDateTime(toDateTime('2020-01-01 01:00:00', 'Europe/Moscow'), '%z'); SELECT formatDateTime(toDateTime('1970-01-01 00:00:00', 'Asia/Kolkata'), '%z'); + +select formatDateTime(toDateTime64('2010-01-04 12:34:56.123456', 7), '%f'); +select formatDateTime(toDateTime64('2022-12-08 18:11:29.00034', 6, 'UTC'), '%f'); + +select formatDateTime(toDateTime64('2022-12-08 18:11:29.1234', 9, 'UTC'), '%F %T.%f'); +select formatDateTime(toDateTime64('2022-12-08 18:11:29.1234', 1, 'UTC'), '%F %T.%f'); +select formatDateTime(toDateTime64('2022-12-08 18:11:29.1234', 0, 'UTC'), '%F %T.%f'); +select formatDateTime(toDateTime('2022-12-08 18:11:29', 'UTC'), '%F %T.%f'); +select formatDateTime(toDate32('2022-12-08 18:11:29', 'UTC'), '%F %T.%f'); +select formatDateTime(toDate('2022-12-08 18:11:29', 'UTC'), '%F %T.%f'); diff --git a/tests/queries/0_stateless/01072_window_view_multiple_columns_groupby.sh b/tests/queries/0_stateless/01072_window_view_multiple_columns_groupby.sh index ccc4ed3e08d..15d4da504f1 100755 --- a/tests/queries/0_stateless/01072_window_view_multiple_columns_groupby.sh +++ b/tests/queries/0_stateless/01072_window_view_multiple_columns_groupby.sh @@ -1,4 +1,5 @@ #!/usr/bin/env bash +# Tags: no-random-settings, no-parallel CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) # shellcheck source=../shell_config.sh @@ -18,7 +19,7 @@ INSERT INTO mt VALUES ('test1', 'test2'); EOF while true; do - $CLICKHOUSE_CLIENT --query="SELECT count(*) FROM dst" | grep -q "1" && break || sleep .5 ||: + $CLICKHOUSE_CLIENT --query="SELECT count(*) FROM dst" | grep -q "1" && break || sleep .1 ||: done $CLICKHOUSE_CLIENT --query="SELECT colA, colB FROM dst" diff --git a/tests/queries/0_stateless/01092_memory_profiler.sql b/tests/queries/0_stateless/01092_memory_profiler.sql index 3869bf941c0..b69d3faf94e 100644 --- a/tests/queries/0_stateless/01092_memory_profiler.sql +++ b/tests/queries/0_stateless/01092_memory_profiler.sql @@ -6,8 +6,9 @@ SET memory_profiler_step = 1000000; SET memory_profiler_sample_probability = 1; SET log_queries = 1; -SELECT ignore(groupArray(number), 'test memory profiler') FROM numbers(10000000); +SELECT ignore(groupArray(number), 'test memory profiler') FROM numbers(10000000) SETTINGS log_comment = '01092_memory_profiler'; + SYSTEM FLUSH LOGS; -WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE event_date >= yesterday() AND trace_type = 'Memory' AND query_id = (SELECT query_id FROM system.query_log WHERE current_database = currentDatabase() AND event_date >= yesterday() AND query LIKE '%test memory profiler%' ORDER BY event_time DESC LIMIT 1); -WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE event_date >= yesterday() AND trace_type = 'MemoryPeak' AND query_id = (SELECT query_id FROM system.query_log WHERE current_database = currentDatabase() AND event_date >= yesterday() AND query LIKE '%test memory profiler%' ORDER BY event_time DESC LIMIT 1); -WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE event_date >= yesterday() AND trace_type = 'MemorySample' AND query_id = (SELECT query_id FROM system.query_log WHERE current_database = currentDatabase() AND event_date >= yesterday() AND query LIKE '%test memory profiler%' ORDER BY event_time DESC LIMIT 1); +WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE event_date >= yesterday() AND trace_type = 'Memory' AND query_id = (SELECT query_id FROM system.query_log WHERE current_database = currentDatabase() AND event_date >= yesterday() AND query LIKE '%test memory profiler%' AND has(used_table_functions, 'numbers') AND log_comment = '01092_memory_profiler' ORDER BY event_time DESC LIMIT 1); +WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE event_date >= yesterday() AND trace_type = 'MemoryPeak' AND query_id = (SELECT query_id FROM system.query_log WHERE current_database = currentDatabase() AND event_date >= yesterday() AND query LIKE '%test memory profiler%' AND has(used_table_functions, 'numbers') AND log_comment = '01092_memory_profiler' ORDER BY event_time DESC LIMIT 1); +WITH addressToSymbol(arrayJoin(trace)) AS symbol SELECT count() > 0 FROM system.trace_log t WHERE event_date >= yesterday() AND trace_type = 'MemorySample' AND query_id = (SELECT query_id FROM system.query_log WHERE current_database = currentDatabase() AND event_date >= yesterday() AND query LIKE '%test memory profiler%' AND has(used_table_functions, 'numbers') AND log_comment = '01092_memory_profiler' ORDER BY event_time DESC LIMIT 1); diff --git a/tests/queries/0_stateless/01159_combinators_with_parameters.reference b/tests/queries/0_stateless/01159_combinators_with_parameters.reference index cc0cb604bf3..c1edc826fcb 100644 --- a/tests/queries/0_stateless/01159_combinators_with_parameters.reference +++ b/tests/queries/0_stateless/01159_combinators_with_parameters.reference @@ -3,7 +3,6 @@ AggregateFunction(topKDistinct(10), String) AggregateFunction(topKForEach(10), Array(String)) AggregateFunction(topKIf(10), String, UInt8) AggregateFunction(topK(10), String) -AggregateFunction(topKOrNull(10), String) AggregateFunction(topKOrDefault(10), String) AggregateFunction(topKResample(10, 1, 2, 42), String, UInt64) AggregateFunction(topK(10), String) diff --git a/tests/queries/0_stateless/01159_combinators_with_parameters.sql b/tests/queries/0_stateless/01159_combinators_with_parameters.sql index 69508d8e304..8b2dbde6480 100644 --- a/tests/queries/0_stateless/01159_combinators_with_parameters.sql +++ b/tests/queries/0_stateless/01159_combinators_with_parameters.sql @@ -3,7 +3,7 @@ SELECT toTypeName(topKDistinctState(10)(toString(number))) FROM numbers(100); SELECT toTypeName(topKForEachState(10)([toString(number)])) FROM numbers(100); SELECT toTypeName(topKIfState(10)(toString(number), number % 2)) FROM numbers(100); SELECT toTypeName(topKMergeState(10)(state)) FROM (SELECT topKState(10)(toString(number)) as state FROM numbers(100)); -SELECT toTypeName(topKOrNullState(10)(toString(number))) FROM numbers(100); +SELECT toTypeName(topKOrNullState(10)(toString(number))) FROM numbers(100); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT } SELECT toTypeName(topKOrDefaultState(10)(toString(number))) FROM numbers(100); SELECT toTypeName(topKResampleState(10, 1, 2, 42)(toString(number), number)) FROM numbers(100); SELECT toTypeName(topKState(10)(toString(number))) FROM numbers(100); diff --git a/tests/queries/0_stateless/01906_lc_in_bug.reference b/tests/queries/0_stateless/01906_lc_in_bug.reference index 9fe1650abf0..adce940e346 100644 --- a/tests/queries/0_stateless/01906_lc_in_bug.reference +++ b/tests/queries/0_stateless/01906_lc_in_bug.reference @@ -1,2 +1,3 @@ 1 0 3 1 +0 diff --git a/tests/queries/0_stateless/01906_lc_in_bug.sql b/tests/queries/0_stateless/01906_lc_in_bug.sql index f8f41da31ae..581053e14e1 100644 --- a/tests/queries/0_stateless/01906_lc_in_bug.sql +++ b/tests/queries/0_stateless/01906_lc_in_bug.sql @@ -6,3 +6,8 @@ insert into tab values ('a'), ('bb'), ('a'), ('cc'); select count() as c, x in ('a', 'bb') as g from tab group by g order by c; drop table if exists tab; + +-- https://github.com/ClickHouse/ClickHouse/issues/44503 +CREATE TABLE test(key Int32) ENGINE = MergeTree ORDER BY (key); +insert into test select intDiv(number,100) from numbers(10000000); +SELECT COUNT() FROM test WHERE key <= 100000 AND (NOT (toLowCardinality('') IN (SELECT ''))); diff --git a/tests/queries/0_stateless/01961_roaring_memory_tracking.sql b/tests/queries/0_stateless/01961_roaring_memory_tracking.sql index 9e14bb9e138..85db40f1104 100644 --- a/tests/queries/0_stateless/01961_roaring_memory_tracking.sql +++ b/tests/queries/0_stateless/01961_roaring_memory_tracking.sql @@ -1,4 +1,4 @@ -- Tags: no-replicated-database -SET max_memory_usage = '100M'; +SET max_memory_usage = '75M'; SELECT cityHash64(rand() % 1000) as n, groupBitmapState(number) FROM numbers_mt(2000000000) GROUP BY n FORMAT Null; -- { serverError 241 } diff --git a/tests/queries/0_stateless/02207_subseconds_intervals.reference b/tests/queries/0_stateless/02207_subseconds_intervals.reference index f2e40137851..91f0ecb8606 100644 --- a/tests/queries/0_stateless/02207_subseconds_intervals.reference +++ b/tests/queries/0_stateless/02207_subseconds_intervals.reference @@ -60,3 +60,19 @@ test add[...]seconds() 2220-12-12 12:12:12.124 2220-12-12 12:12:12.121 2220-12-12 12:12:12.124456 +test subtract[...]seconds() +- test nanoseconds +2022-12-31 23:59:59.999999999 +2022-12-31 23:59:59.999999900 +2023-01-01 00:00:00.000000001 +2023-01-01 00:00:00.000000100 +- test microseconds +2022-12-31 23:59:59.999999 +2022-12-31 23:59:59.999900 +2023-01-01 00:00:00.000001 +2023-01-01 00:00:00.000100 +- test milliseconds +2022-12-31 23:59:59.999 +2022-12-31 23:59:59.900 +2023-01-01 00:00:00.001 +2023-01-01 00:00:00.100 diff --git a/tests/queries/0_stateless/02207_subseconds_intervals.sql b/tests/queries/0_stateless/02207_subseconds_intervals.sql index a7ce03d9330..c30b3c460dc 100644 --- a/tests/queries/0_stateless/02207_subseconds_intervals.sql +++ b/tests/queries/0_stateless/02207_subseconds_intervals.sql @@ -92,3 +92,22 @@ select addMilliseconds(toDateTime64('1930-12-12 12:12:12.123456', 6), 1); -- Bel select addMilliseconds(toDateTime64('2220-12-12 12:12:12.123', 3), 1); -- Above normal range, source scale matches result select addMilliseconds(toDateTime64('2220-12-12 12:12:12.12', 2), 1); -- Above normal range, source scale less than result select addMilliseconds(toDateTime64('2220-12-12 12:12:12.123456', 6), 1); -- Above normal range, source scale greater than result + +select 'test subtract[...]seconds()'; +select '- test nanoseconds'; +select subtractNanoseconds(toDateTime64('2023-01-01 00:00:00.0000000', 7, 'UTC'), 1); +select subtractNanoseconds(toDateTime64('2023-01-01 00:00:00.0000000', 7, 'UTC'), 100); +select subtractNanoseconds(toDateTime64('2023-01-01 00:00:00.0000000', 7, 'UTC'), -1); +select subtractNanoseconds(toDateTime64('2023-01-01 00:00:00.0000000', 7, 'UTC'), -100); + +select '- test microseconds'; +select subtractMicroseconds(toDateTime64('2023-01-01 00:00:00.0000', 4, 'UTC'), 1); +select subtractMicroseconds(toDateTime64('2023-01-01 00:00:00.0000', 4, 'UTC'), 100); +select subtractMicroseconds(toDateTime64('2023-01-01 00:00:00.0000', 4, 'UTC'), -1); +select subtractMicroseconds(toDateTime64('2023-01-01 00:00:00.0000', 4, 'UTC'), -100); + +select '- test milliseconds'; +select subtractMilliseconds(toDateTime64('2023-01-01 00:00:00.0', 1, 'UTC'), 1); +select subtractMilliseconds(toDateTime64('2023-01-01 00:00:00.0', 1, 'UTC'), 100); +select subtractMilliseconds(toDateTime64('2023-01-01 00:00:00.0', 1, 'UTC'), -1); +select subtractMilliseconds(toDateTime64('2023-01-01 00:00:00.0', 1, 'UTC'), -100); diff --git a/tests/queries/0_stateless/02343_aggregation_pipeline.sql b/tests/queries/0_stateless/02343_aggregation_pipeline.sql index 85e9fd1be1e..b018cf21f91 100644 --- a/tests/queries/0_stateless/02343_aggregation_pipeline.sql +++ b/tests/queries/0_stateless/02343_aggregation_pipeline.sql @@ -1,3 +1,6 @@ +-- produces different pipeline if enabled +set enable_memory_bound_merging_of_aggregation_results = 0; + set max_threads = 16; set prefer_localhost_replica = 1; set optimize_aggregation_in_order = 0; diff --git a/tests/queries/0_stateless/02344_show_caches.reference b/tests/queries/0_stateless/02344_show_caches.reference index 68882f63e1f..2ee4f902ba1 100644 --- a/tests/queries/0_stateless/02344_show_caches.reference +++ b/tests/queries/0_stateless/02344_show_caches.reference @@ -5,6 +5,7 @@ s3_cache_3 s3_cache_multi s3_cache_4 s3_cache_5 +s3_cache_small_segment_size local_cache s3_cache_6 s3_cache_small diff --git a/tests/queries/0_stateless/02481_array_join_with_map.reference b/tests/queries/0_stateless/02481_array_join_with_map.reference new file mode 100644 index 00000000000..81fa77358db --- /dev/null +++ b/tests/queries/0_stateless/02481_array_join_with_map.reference @@ -0,0 +1,33 @@ +Hello 1 (1,'1') +Hello 2 (2,'2') +World 3 (3,'3') +World 4 (4,'4') +World 5 (5,'5') +Hello 1 (1,'1') +Hello 2 (2,'2') +World 3 (3,'3') +World 4 (4,'4') +World 5 (5,'5') +Goodbye 0 (0,'') +Hello (1,'1') +Hello (2,'2') +World (3,'3') +World (4,'4') +World (5,'5') +Hello (1,'1') +Hello (2,'2') +World (3,'3') +World (4,'4') +World (5,'5') +Goodbye (0,'') +Hello (1,'1') (1,'1') +Hello (2,'2') (0,'') +World (3,'3') (3,'3') +World (4,'4') (4,'4') +World (5,'5') (0,'') +Hello (1,'1') (1,'1') +Hello (2,'2') (0,'') +World (3,'3') (3,'3') +World (4,'4') (4,'4') +World (5,'5') (0,'') +Goodbye (0,'') (0,'') diff --git a/tests/queries/0_stateless/02481_array_join_with_map.sql b/tests/queries/0_stateless/02481_array_join_with_map.sql new file mode 100644 index 00000000000..564b99e6e47 --- /dev/null +++ b/tests/queries/0_stateless/02481_array_join_with_map.sql @@ -0,0 +1,25 @@ +DROP TABLE IF EXISTS arrays_test; + +CREATE TABLE arrays_test +( + s String, + arr1 Array(UInt8), + map1 Map(UInt8, String), + map2 Map(UInt8, String) +) ENGINE = Memory; + +INSERT INTO arrays_test +VALUES ('Hello', [1,2], map(1, '1', 2, '2'), map(1, '1')), ('World', [3,4,5], map(3, '3', 4, '4', 5, '5'), map(3, '3', 4, '4')), ('Goodbye', [], map(), map()); + + +select s, arr1, map1 from arrays_test array join arr1, map1 settings enable_unaligned_array_join = 1; + +select s, arr1, map1 from arrays_test left array join arr1, map1 settings enable_unaligned_array_join = 1; + +select s, map1 from arrays_test array join map1; + +select s, map1 from arrays_test left array join map1; + +select s, map1, map2 from arrays_test array join map1, map2 settings enable_unaligned_array_join = 1; + +select s, map1, map2 from arrays_test left array join map1, map2 settings enable_unaligned_array_join = 1; diff --git a/tests/queries/0_stateless/02482_capnp_list_of_structs.reference b/tests/queries/0_stateless/02482_capnp_list_of_structs.reference new file mode 100644 index 00000000000..002eae70f97 --- /dev/null +++ b/tests/queries/0_stateless/02482_capnp_list_of_structs.reference @@ -0,0 +1,4 @@ +[(1,3),(2,4)] +[1,2] [3,4] +[1,2] [3,4] +[1,2] diff --git a/tests/queries/0_stateless/02482_capnp_list_of_structs.sh b/tests/queries/0_stateless/02482_capnp_list_of_structs.sh new file mode 100755 index 00000000000..091bd4dba2a --- /dev/null +++ b/tests/queries/0_stateless/02482_capnp_list_of_structs.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash +# Tags: no-fasttest, no-parallel + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +USER_FILES_PATH=$(clickhouse-client --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}') +touch $USER_FILES_PATH/data.capnp + +SCHEMADIR=$(clickhouse-client --query "select * from file('data.capnp', 'CapnProto', 'val1 char') settings format_schema='nonexist:Message'" 2>&1 | grep Exception | grep -oP "file \K.*(?=/nonexist.capnp)") +CLIENT_SCHEMADIR=$CURDIR/format_schemas +SERVER_SCHEMADIR=test_02482 +mkdir -p $SCHEMADIR/$SERVER_SCHEMADIR +cp -r $CLIENT_SCHEMADIR/02482_* $SCHEMADIR/$SERVER_SCHEMADIR/ + + +$CLICKHOUSE_CLIENT -q "insert into function file(02482_data.capnp, auto, 'nested Nested(x Int64, y Int64)') select [1,2], [3,4] settings format_schema='$SERVER_SCHEMADIR/02482_list_of_structs.capnp:Nested', engine_file_truncate_on_insert=1" +$CLICKHOUSE_CLIENT -q "select * from file(02482_data.capnp) settings format_schema='$SERVER_SCHEMADIR/02482_list_of_structs.capnp:Nested'" +$CLICKHOUSE_CLIENT -q "select * from file(02482_data.capnp, auto, 'nested Nested(x Int64, y Int64)') settings format_schema='$SERVER_SCHEMADIR/02482_list_of_structs.capnp:Nested'" +$CLICKHOUSE_CLIENT -q "select * from file(02482_data.capnp, auto, '\`nested.x\` Array(Int64), \`nested.y\` Array(Int64)') settings format_schema='$SERVER_SCHEMADIR/02482_list_of_structs.capnp:Nested'" +$CLICKHOUSE_CLIENT -q "select * from file(02482_data.capnp, auto, '\`nested.x\` Array(Int64)') settings format_schema='$SERVER_SCHEMADIR/02482_list_of_structs.capnp:Nested'" + +rm $USER_FILES_PATH/data.capnp +rm $USER_FILES_PATH/02482_data.capnp diff --git a/tests/queries/0_stateless/02483_capnp_decimals.reference b/tests/queries/0_stateless/02483_capnp_decimals.reference new file mode 100644 index 00000000000..9885da95ce2 --- /dev/null +++ b/tests/queries/0_stateless/02483_capnp_decimals.reference @@ -0,0 +1,2 @@ +4242424242 42420 +4242.424242 42.42 diff --git a/tests/queries/0_stateless/02483_capnp_decimals.sh b/tests/queries/0_stateless/02483_capnp_decimals.sh new file mode 100755 index 00000000000..bdfa9dac3d5 --- /dev/null +++ b/tests/queries/0_stateless/02483_capnp_decimals.sh @@ -0,0 +1,24 @@ +#!/usr/bin/env bash +# Tags: no-fasttest, no-parallel + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +USER_FILES_PATH=$(clickhouse-client --query "select _path,_file from file('nonexist.txt', 'CSV', 'val1 char')" 2>&1 | grep Exception | awk '{gsub("/nonexist.txt","",$9); print $9}') +touch $USER_FILES_PATH/data.capnp + +SCHEMADIR=$(clickhouse-client --query "select * from file('data.capnp', 'CapnProto', 'val1 char') settings format_schema='nonexist:Message'" 2>&1 | grep Exception | grep -oP "file \K.*(?=/nonexist.capnp)") +CLIENT_SCHEMADIR=$CURDIR/format_schemas +SERVER_SCHEMADIR=test_02483 +mkdir -p $SCHEMADIR/$SERVER_SCHEMADIR +cp -r $CLIENT_SCHEMADIR/02483_* $SCHEMADIR/$SERVER_SCHEMADIR/ + + +$CLICKHOUSE_CLIENT -q "insert into function file(02483_data.capnp, auto, 'decimal32 Decimal32(3), decimal64 Decimal64(6)') select 42.42, 4242.424242 settings format_schema='$SERVER_SCHEMADIR/02483_decimals.capnp:Message', engine_file_truncate_on_insert=1" +$CLICKHOUSE_CLIENT -q "select * from file(02483_data.capnp) settings format_schema='$SERVER_SCHEMADIR/02483_decimals.capnp:Message'" +$CLICKHOUSE_CLIENT -q "select * from file(02483_data.capnp, auto, 'decimal64 Decimal64(6), decimal32 Decimal32(3)') settings format_schema='$SERVER_SCHEMADIR/02483_decimals.capnp:Message'" + +rm $USER_FILES_PATH/data.capnp +rm $USER_FILES_PATH/02483_data.capnp + diff --git a/tests/queries/0_stateless/02494_parser_string_binary_literal.reference b/tests/queries/0_stateless/02494_parser_string_binary_literal.reference new file mode 100644 index 00000000000..4fbadddcd21 --- /dev/null +++ b/tests/queries/0_stateless/02494_parser_string_binary_literal.reference @@ -0,0 +1,24 @@ + +1 +0 +10 +1 + +1 +0 +10 +1 + +1 +0 +10 +1 + +1 +0 +10 +1 +1 +1 +1 +1 diff --git a/tests/queries/0_stateless/02494_parser_string_binary_literal.sql b/tests/queries/0_stateless/02494_parser_string_binary_literal.sql new file mode 100644 index 00000000000..ebfe2a198b5 --- /dev/null +++ b/tests/queries/0_stateless/02494_parser_string_binary_literal.sql @@ -0,0 +1,29 @@ +select b''; +select b'0' == '\0'; +select b'00110000'; -- 0 +select b'0011000100110000'; -- 10 +select b'111001101011010110001011111010001010111110010101' == '测试'; + +select B''; +select B'0' == '\0'; +select B'00110000'; -- 0 +select B'0011000100110000'; -- 10 +select B'111001101011010110001011111010001010111110010101' == '测试'; + +select x''; +select x'0' == '\0'; +select x'30'; -- 0 +select x'3130'; -- 10 +select x'e6b58be8af95' == '测试'; + +select X''; +select X'0' == '\0'; +select X'30'; -- 0 +select X'3130'; -- 10 +select X'e6b58be8af95' == '测试'; + + +select x'' == b''; +select x'0' == b'0'; +select X'' == X''; +select X'0' == X'0'; diff --git a/tests/queries/0_stateless/02495_parser_string_binary_literal.reference b/tests/queries/0_stateless/02495_parser_string_binary_literal.reference new file mode 100644 index 00000000000..0f91f17602d --- /dev/null +++ b/tests/queries/0_stateless/02495_parser_string_binary_literal.reference @@ -0,0 +1,6 @@ +Syntax error +Syntax error +Syntax error +Syntax error +Syntax error +Syntax error diff --git a/tests/queries/0_stateless/02495_parser_string_binary_literal.sh b/tests/queries/0_stateless/02495_parser_string_binary_literal.sh new file mode 100755 index 00000000000..88998b06a01 --- /dev/null +++ b/tests/queries/0_stateless/02495_parser_string_binary_literal.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + + +$CLICKHOUSE_CLIENT --query="SELECT b '0';" 2>&1 | grep -o 'Syntax error' +$CLICKHOUSE_CLIENT --query="SELECT x 'a'" 2>&1 | grep -o 'Syntax error' + +$CLICKHOUSE_CLIENT --query="SELECT b'3';" 2>&1 | grep -o 'Syntax error' +$CLICKHOUSE_CLIENT --query="SELECT x'k'" 2>&1 | grep -o 'Syntax error' + +$CLICKHOUSE_CLIENT --query="SELECT b'1" 2>&1 | grep -o 'Syntax error' +$CLICKHOUSE_CLIENT --query="SELECT x'a" 2>&1 | grep -o 'Syntax error' diff --git a/tests/queries/0_stateless/02497_if_transform_strings_to_enum.reference b/tests/queries/0_stateless/02497_if_transform_strings_to_enum.reference index 06863f1858b..c6265e195c4 100644 --- a/tests/queries/0_stateless/02497_if_transform_strings_to_enum.reference +++ b/tests/queries/0_stateless/02497_if_transform_strings_to_enum.reference @@ -19,7 +19,7 @@ QUERY id: 0 FUNCTION id: 2, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 3, nodes: 1 - FUNCTION id: 4, function_name: transform, function_type: ordinary, result_type: String + FUNCTION id: 4, function_name: transform, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2, \'other\' = 3, \'yahoo\' = 4) ARGUMENTS LIST id: 5, nodes: 4 COLUMN id: 6, column_name: number, result_type: UInt64, source_id: 7 @@ -59,7 +59,7 @@ QUERY id: 0 FUNCTION id: 2, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 3, nodes: 1 - FUNCTION id: 4, function_name: if, function_type: ordinary, result_type: String + FUNCTION id: 4, function_name: if, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2) ARGUMENTS LIST id: 5, nodes: 3 FUNCTION id: 6, function_name: greater, function_type: ordinary, result_type: UInt8 @@ -105,7 +105,7 @@ QUERY id: 0 FUNCTION id: 4, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 5, nodes: 1 - FUNCTION id: 6, function_name: transform, function_type: ordinary, result_type: String + FUNCTION id: 6, function_name: transform, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2, \'other\' = 3, \'yahoo\' = 4) ARGUMENTS LIST id: 7, nodes: 4 COLUMN id: 8, column_name: number, result_type: UInt64, source_id: 9 @@ -149,7 +149,7 @@ QUERY id: 0 FUNCTION id: 4, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 5, nodes: 1 - FUNCTION id: 6, function_name: if, function_type: ordinary, result_type: String + FUNCTION id: 6, function_name: if, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2) ARGUMENTS LIST id: 7, nodes: 3 FUNCTION id: 8, function_name: greater, function_type: ordinary, result_type: UInt8 @@ -204,7 +204,7 @@ QUERY id: 0 FUNCTION id: 5, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 6, nodes: 1 - FUNCTION id: 7, function_name: if, function_type: ordinary, result_type: String + FUNCTION id: 7, function_name: if, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2) ARGUMENTS LIST id: 8, nodes: 3 FUNCTION id: 9, function_name: greater, function_type: ordinary, result_type: UInt8 @@ -258,7 +258,7 @@ QUERY id: 0 FUNCTION id: 5, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 6, nodes: 1 - FUNCTION id: 7, function_name: transform, function_type: ordinary, result_type: String + FUNCTION id: 7, function_name: transform, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2, \'other\' = 3, \'yahoo\' = 4) ARGUMENTS LIST id: 8, nodes: 4 COLUMN id: 9, column_name: number, result_type: UInt64, source_id: 10 @@ -301,7 +301,7 @@ QUERY id: 0 FUNCTION id: 2, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 3, nodes: 1 - FUNCTION id: 4, function_name: if, function_type: ordinary, result_type: String + FUNCTION id: 4, function_name: if, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2) ARGUMENTS LIST id: 5, nodes: 3 FUNCTION id: 6, function_name: greater, function_type: ordinary, result_type: UInt8 @@ -322,7 +322,7 @@ QUERY id: 0 FUNCTION id: 2, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 3, nodes: 1 - FUNCTION id: 4, function_name: if, function_type: ordinary, result_type: String + FUNCTION id: 4, function_name: if, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2) ARGUMENTS LIST id: 5, nodes: 3 FUNCTION id: 6, function_name: greater, function_type: ordinary, result_type: UInt8 @@ -368,7 +368,7 @@ QUERY id: 0 FUNCTION id: 2, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 3, nodes: 1 - FUNCTION id: 4, function_name: transform, function_type: ordinary, result_type: String + FUNCTION id: 4, function_name: transform, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2, \'other\' = 3, \'yahoo\' = 4) ARGUMENTS LIST id: 5, nodes: 4 COLUMN id: 6, column_name: number, result_type: UInt64, source_id: 7 @@ -386,7 +386,7 @@ QUERY id: 0 FUNCTION id: 2, function_name: toString, function_type: ordinary, result_type: String ARGUMENTS LIST id: 3, nodes: 1 - FUNCTION id: 4, function_name: transform, function_type: ordinary, result_type: String + FUNCTION id: 4, function_name: transform, function_type: ordinary, result_type: Enum8(\'censor.net\' = 1, \'google\' = 2, \'other\' = 3, \'yahoo\' = 4) ARGUMENTS LIST id: 5, nodes: 4 COLUMN id: 6, column_name: number, result_type: UInt64, source_id: 7 diff --git a/tests/queries/0_stateless/02499_analyzer_set_index.reference b/tests/queries/0_stateless/02499_analyzer_set_index.reference new file mode 100644 index 00000000000..6ed281c757a --- /dev/null +++ b/tests/queries/0_stateless/02499_analyzer_set_index.reference @@ -0,0 +1,2 @@ +1 +1 diff --git a/tests/queries/0_stateless/02499_analyzer_set_index.sql b/tests/queries/0_stateless/02499_analyzer_set_index.sql new file mode 100644 index 00000000000..f90ae61541f --- /dev/null +++ b/tests/queries/0_stateless/02499_analyzer_set_index.sql @@ -0,0 +1,18 @@ +SET allow_experimental_analyzer = 1; + +DROP TABLE IF EXISTS test_table; +CREATE TABLE test_table +( + id UInt64, + value String, + INDEX value_idx (value) TYPE set(1000) GRANULARITY 1 +) ENGINE=MergeTree ORDER BY id; + +INSERT INTO test_table SELECT number, toString(number) FROM numbers(10); + +SELECT count() FROM test_table WHERE value = '1' SETTINGS force_data_skipping_indices = 'value_idx'; + +SELECT count() FROM test_table AS t1 INNER JOIN (SELECT number AS id FROM numbers(10)) AS t2 ON t1.id = t2.id +WHERE t1.value = '1' SETTINGS force_data_skipping_indices = 'value_idx'; + +DROP TABLE test_table; diff --git a/tests/queries/0_stateless/02500_bson_read_object_id.reference b/tests/queries/0_stateless/02500_bson_read_object_id.reference new file mode 100644 index 00000000000..860d79a30da --- /dev/null +++ b/tests/queries/0_stateless/02500_bson_read_object_id.reference @@ -0,0 +1,6 @@ +_id Nullable(FixedString(12)) +name Nullable(String) +email Nullable(String) +movie_id Nullable(FixedString(12)) +text Nullable(String) +date Nullable(DateTime64(6, \'UTC\')) diff --git a/tests/queries/0_stateless/02500_bson_read_object_id.sh b/tests/queries/0_stateless/02500_bson_read_object_id.sh new file mode 100755 index 00000000000..015b5402fa4 --- /dev/null +++ b/tests/queries/0_stateless/02500_bson_read_object_id.sh @@ -0,0 +1,10 @@ +#!/usr/bin/env bash +# Tags: no-fasttest + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +$CLICKHOUSE_LOCAL -q "desc file('$CURDIR/data_bson/comments.bson')" +$CLICKHOUSE_LOCAL -q "select _id from file('$CURDIR/data_bson/comments.bson') format Null" + diff --git a/tests/queries/0_stateless/02503_cache_on_write_with_small_segment_size.reference b/tests/queries/0_stateless/02503_cache_on_write_with_small_segment_size.reference new file mode 100644 index 00000000000..1823b83ae28 --- /dev/null +++ b/tests/queries/0_stateless/02503_cache_on_write_with_small_segment_size.reference @@ -0,0 +1,3 @@ +0 +83 +100000 diff --git a/tests/queries/0_stateless/02503_cache_on_write_with_small_segment_size.sh b/tests/queries/0_stateless/02503_cache_on_write_with_small_segment_size.sh new file mode 100755 index 00000000000..918adc12de6 --- /dev/null +++ b/tests/queries/0_stateless/02503_cache_on_write_with_small_segment_size.sh @@ -0,0 +1,37 @@ +#!/usr/bin/env bash +# Tags: no-parallel, no-fasttest, no-s3-storage, no-random-settings + +CLICKHOUSE_CLIENT_SERVER_LOGS_LEVEL=none + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +# shellcheck source=../shell_config.sh +. "$CURDIR"/../shell_config.sh + +function random { + cat /dev/urandom | LC_ALL=C tr -dc 'a-zA-Z' | fold -w ${1:-8} | head -n 1 +} + +${CLICKHOUSE_CLIENT} --multiline --multiquery -q " +drop table if exists ttt; +create table ttt (id Int32, value String) engine=MergeTree() order by tuple() settings storage_policy='s3_cache_small_segment_size', min_bytes_for_wide_part=0; +insert into ttt select number, toString(number) from numbers(100000) settings throw_on_error_from_cache_on_write_operations = 1; +" + +query_id=$(random 8) + +${CLICKHOUSE_CLIENT} --query_id "$query_id" -q " +select * from ttt format Null settings enable_filesystem_cache_log=1; +" +${CLICKHOUSE_CLIENT} --query_id "$query_id" -q " system flush logs" + +${CLICKHOUSE_CLIENT} -q " +select count() from system.filesystem_cache_log where query_id = '$query_id' AND read_type != 'READ_FROM_CACHE'; +" +${CLICKHOUSE_CLIENT} -q " +select count() from system.filesystem_cache_log where query_id = '$query_id' AND read_type == 'READ_FROM_CACHE'; +" + +${CLICKHOUSE_CLIENT} --multiline --multiquery -q " +select count() from ttt; +drop table ttt no delay; +" diff --git a/tests/queries/0_stateless/02513_analyzer_sort_msan.reference b/tests/queries/0_stateless/02513_analyzer_sort_msan.reference new file mode 100644 index 00000000000..d00491fd7e5 --- /dev/null +++ b/tests/queries/0_stateless/02513_analyzer_sort_msan.reference @@ -0,0 +1 @@ +1 diff --git a/tests/queries/0_stateless/02513_analyzer_sort_msan.sql b/tests/queries/0_stateless/02513_analyzer_sort_msan.sql new file mode 100644 index 00000000000..e5beccaff2a --- /dev/null +++ b/tests/queries/0_stateless/02513_analyzer_sort_msan.sql @@ -0,0 +1,8 @@ +DROP TABLE IF EXISTS products; + +SET allow_experimental_analyzer = 1; + +CREATE TABLE products (`price` UInt32) ENGINE = Memory; +INSERT INTO products VALUES (1); + +SELECT rank() OVER (ORDER BY price) AS rank FROM products ORDER BY rank; diff --git a/tests/queries/0_stateless/02513_date_string_comparison.reference b/tests/queries/0_stateless/02513_date_string_comparison.reference new file mode 100644 index 00000000000..931f2f594f3 --- /dev/null +++ b/tests/queries/0_stateless/02513_date_string_comparison.reference @@ -0,0 +1,27 @@ +Date +2 +2 +2 +2 +DateTime +3 +3 +3 +3 +3 +Date String +2 +2 +2 +DateTime String +3 +3 +3 +Date LC +2 +2 +2 +DateTime LC +3 +3 +3 diff --git a/tests/queries/0_stateless/02513_date_string_comparison.sql b/tests/queries/0_stateless/02513_date_string_comparison.sql new file mode 100644 index 00000000000..40bc8070987 --- /dev/null +++ b/tests/queries/0_stateless/02513_date_string_comparison.sql @@ -0,0 +1,65 @@ +CREATE TABLE datetime_date_table ( + col_date Date, + col_datetime DateTime, + col_datetime64 DateTime64(3), + col_date_string String, + col_datetime_string String, + col_datetime64_string DateTime64, + col_date_lc LowCardinality(String), + col_datetime_lc LowCardinality(String), + col_datetime64_lc LowCardinality(String), + PRIMARY KEY col_date +) ENGINE = MergeTree; + +INSERT INTO datetime_date_table VALUES ('2020-03-04', '2020-03-04 10:23:45', '2020-03-04 10:23:45.123', '2020-03-04', '2020-03-04 10:23:45', '2020-03-04 10:23:45.123', '2020-03-04', '2020-03-04 10:23:45', '2020-03-04 10:23:45.123'); +INSERT INTO datetime_date_table VALUES ('2020-03-05', '2020-03-05 12:23:45', '2020-03-05 12:23:45.123', '2020-03-05', '2020-03-05 12:23:45', '2020-03-05 12:23:45.123', '2020-03-05', '2020-03-05 12:23:45', '2020-03-05 12:23:45.123'); +INSERT INTO datetime_date_table VALUES ('2020-04-05', '2020-04-05 00:10:45', '2020-04-05 00:10:45.123', '2020-04-05', '2020-04-05 00:10:45', '2020-04-05 00:10:45.123', '2020-04-05', '2020-04-05 00:10:45', '2020-04-05 00:10:45.123'); + +SELECT 'Date'; +SELECT count() FROM datetime_date_table WHERE col_date > '2020-03-04'; +SELECT count() FROM datetime_date_table WHERE col_date > '2020-03-04'::Date; +SELECT count() FROM datetime_date_table WHERE col_date > '2020-03-04 10:20:45'; -- { serverError TYPE_MISMATCH } +SELECT count() FROM datetime_date_table WHERE col_date > '2020-03-04 10:20:45'::DateTime; +SELECT count() FROM datetime_date_table WHERE col_date > '2020-03-04 10:20:45.100'; -- { serverError TYPE_MISMATCH } +SELECT count() FROM datetime_date_table WHERE col_date > '2020-03-04 10:20:45.100'::DateTime64(3); + +SELECT 'DateTime'; +SELECT count() FROM datetime_date_table WHERE col_datetime > '2020-03-04'; +SELECT count() FROM datetime_date_table WHERE col_datetime > '2020-03-04'::Date; +SELECT count() FROM datetime_date_table WHERE col_datetime > '2020-03-04 10:20:45'; +SELECT count() FROM datetime_date_table WHERE col_datetime > '2020-03-04 10:20:45'::DateTime; +SELECT count() FROM datetime_date_table WHERE col_datetime > '2020-03-04 10:20:45.100'; -- { serverError TYPE_MISMATCH } +SELECT count() FROM datetime_date_table WHERE col_datetime > '2020-03-04 10:20:45.100'::DateTime64(3); + +SELECT 'Date String'; +SELECT count() FROM datetime_date_table WHERE col_date_string > '2020-03-04'; +SELECT count() FROM datetime_date_table WHERE col_date_string > '2020-03-04'::Date; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_date_string > '2020-03-04 10:20:45'; +SELECT count() FROM datetime_date_table WHERE col_date_string > '2020-03-04 10:20:45'::DateTime; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_date_string > '2020-03-04 10:20:45.100'; +SELECT count() FROM datetime_date_table WHERE col_date_string > '2020-03-04 10:20:45.100'::DateTime64(3); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT } + +SELECT 'DateTime String'; +SELECT count() FROM datetime_date_table WHERE col_datetime_string > '2020-03-04'; +SELECT count() FROM datetime_date_table WHERE col_datetime_string > '2020-03-04'::Date; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_datetime_string > '2020-03-04 10:20:45'; +SELECT count() FROM datetime_date_table WHERE col_datetime_string > '2020-03-04 10:20:45'::DateTime; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_datetime_string > '2020-03-04 10:20:45.100'; +SELECT count() FROM datetime_date_table WHERE col_datetime_string > '2020-03-04 10:20:45.100'::DateTime64(3); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT } + +SELECT 'Date LC'; +SELECT count() FROM datetime_date_table WHERE col_date_lc > '2020-03-04'; +SELECT count() FROM datetime_date_table WHERE col_date_lc > '2020-03-04'::Date; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_date_lc > '2020-03-04 10:20:45'; +SELECT count() FROM datetime_date_table WHERE col_date_lc > '2020-03-04 10:20:45'::DateTime; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_date_lc > '2020-03-04 10:20:45.100'; +SELECT count() FROM datetime_date_table WHERE col_date_lc > '2020-03-04 10:20:45.100'::DateTime64(3); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT } + +SELECT 'DateTime LC'; +SELECT count() FROM datetime_date_table WHERE col_datetime_lc > '2020-03-04'; +SELECT count() FROM datetime_date_table WHERE col_datetime_lc > '2020-03-04'::Date; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_datetime_lc > '2020-03-04 10:20:45'; +SELECT count() FROM datetime_date_table WHERE col_datetime_lc > '2020-03-04 10:20:45'::DateTime; -- { serverError NO_COMMON_TYPE } +SELECT count() FROM datetime_date_table WHERE col_datetime_lc > '2020-03-04 10:20:45.100'; +SELECT count() FROM datetime_date_table WHERE col_datetime_lc > '2020-03-04 10:20:45.100'::DateTime64(3); -- { serverError ILLEGAL_TYPE_OF_ARGUMENT } + diff --git a/tests/queries/0_stateless/data_bson/comments.bson b/tests/queries/0_stateless/data_bson/comments.bson new file mode 100644 index 00000000000..9aa4b6e6562 Binary files /dev/null and b/tests/queries/0_stateless/data_bson/comments.bson differ diff --git a/tests/queries/0_stateless/format_schemas/02482_list_of_structs.capnp b/tests/queries/0_stateless/format_schemas/02482_list_of_structs.capnp new file mode 100644 index 00000000000..b203b5b1bdf --- /dev/null +++ b/tests/queries/0_stateless/format_schemas/02482_list_of_structs.capnp @@ -0,0 +1,11 @@ +@0xb6ecde1cd54a101d; + +struct Nested { + nested @0 :List(MyField); +} + +struct MyField { + x @0 :Int64; + y @1 :Int64; +} + diff --git a/tests/queries/0_stateless/format_schemas/02483_decimals.capnp b/tests/queries/0_stateless/format_schemas/02483_decimals.capnp new file mode 100644 index 00000000000..eff4d488420 --- /dev/null +++ b/tests/queries/0_stateless/format_schemas/02483_decimals.capnp @@ -0,0 +1,7 @@ +@0xb6acde1cd54a101d; + +struct Message { + decimal64 @0 :Int64; + decimal32 @1 :Int32; +} + diff --git a/tests/queries/1_stateful/00172_parallel_join.reference.j2 b/tests/queries/1_stateful/00172_hits_joins.reference.j2 similarity index 99% rename from tests/queries/1_stateful/00172_parallel_join.reference.j2 rename to tests/queries/1_stateful/00172_hits_joins.reference.j2 index 1a43f1fb6ef..c357ede4c2c 100644 --- a/tests/queries/1_stateful/00172_parallel_join.reference.j2 +++ b/tests/queries/1_stateful/00172_hits_joins.reference.j2 @@ -1,4 +1,4 @@ -{% for join_algorithm in ['hash', 'parallel_hash', 'full_sorting_merge', 'grace_hash'] -%} +{% for join_algorithm in ['hash', 'parallel_hash', 'full_sorting_merge'] -%} --- {{ join_algorithm }} --- 2014-03-17 1406958 265108 2014-03-19 1405797 261624 diff --git a/tests/queries/1_stateful/00172_parallel_join.sql.j2 b/tests/queries/1_stateful/00172_hits_joins.sql.j2 similarity index 99% rename from tests/queries/1_stateful/00172_parallel_join.sql.j2 rename to tests/queries/1_stateful/00172_hits_joins.sql.j2 index ff077f43874..07ea899f536 100644 --- a/tests/queries/1_stateful/00172_parallel_join.sql.j2 +++ b/tests/queries/1_stateful/00172_hits_joins.sql.j2 @@ -1,4 +1,4 @@ -{% for join_algorithm in ['hash', 'parallel_hash', 'full_sorting_merge', 'grace_hash'] -%} +{% for join_algorithm in ['hash', 'parallel_hash', 'full_sorting_merge'] -%} SET max_bytes_in_join = '{% if join_algorithm == 'grace_hash' %}20K{% else %}0{% endif %}';