#pragma once #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include namespace DB { namespace ErrorCodes { extern const int ATTEMPT_TO_READ_AFTER_EOF; extern const int CANNOT_PARSE_NUMBER; extern const int CANNOT_READ_ARRAY_FROM_TEXT; extern const int CANNOT_PARSE_INPUT_ASSERTION_FAILED; extern const int CANNOT_PARSE_QUOTED_STRING; extern const int CANNOT_PARSE_ESCAPE_SEQUENCE; extern const int CANNOT_PARSE_DATE; extern const int CANNOT_PARSE_DATETIME; extern const int CANNOT_PARSE_TEXT; extern const int CANNOT_PARSE_UUID; extern const int TOO_LARGE_STRING_SIZE; extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION; extern const int LOGICAL_ERROR; extern const int TYPE_MISMATCH; extern const int CANNOT_CONVERT_TYPE; extern const int ILLEGAL_COLUMN; extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int ILLEGAL_TYPE_OF_ARGUMENT; extern const int NOT_IMPLEMENTED; extern const int CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN; } /** Type conversion functions. * toType - conversion in "natural way"; */ inline UInt32 extractToDecimalScale(const ColumnWithTypeAndName & named_column) { const auto * arg_type = named_column.type.get(); bool ok = checkAndGetDataType(arg_type) || checkAndGetDataType(arg_type) || checkAndGetDataType(arg_type) || checkAndGetDataType(arg_type); if (!ok) throw Exception("Illegal type of toDecimal() scale " + named_column.type->getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); Field field; named_column.column->get(0, field); return field.get(); } /** Conversion of number types to each other, enums to numbers, dates and datetimes to numbers and back: done by straight assignment. * (Date is represented internally as number of days from some day; DateTime - as unix timestamp) */ template struct ConvertImpl { using FromFieldType = typename FromDataType::FieldType; using ToFieldType = typename ToDataType::FieldType; template static void NO_SANITIZE_UNDEFINED execute(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/, Additions additions [[maybe_unused]] = Additions()) { const ColumnWithTypeAndName & named_from = block.getByPosition(arguments[0]); using ColVecFrom = std::conditional_t, ColumnDecimal, ColumnVector>; using ColVecTo = std::conditional_t, ColumnDecimal, ColumnVector>; if constexpr (IsDataTypeDecimal || IsDataTypeDecimal) { if constexpr (!IsDataTypeDecimalOrNumber || !IsDataTypeDecimalOrNumber) throw Exception("Illegal column " + named_from.column->getName() + " of first argument of function " + Name::name, ErrorCodes::ILLEGAL_COLUMN); } if (const ColVecFrom * col_from = checkAndGetColumn(named_from.column.get())) { typename ColVecTo::MutablePtr col_to = nullptr; if constexpr (IsDataTypeDecimal) { UInt32 scale = additions; col_to = ColVecTo::create(0, scale); } else col_to = ColVecTo::create(); const auto & vec_from = col_from->getData(); auto & vec_to = col_to->getData(); size_t size = vec_from.size(); vec_to.resize(size); for (size_t i = 0; i < size; ++i) { if constexpr (IsDataTypeDecimal || IsDataTypeDecimal) { if constexpr (IsDataTypeDecimal && IsDataTypeDecimal) vec_to[i] = convertDecimals(vec_from[i], vec_from.getScale(), vec_to.getScale()); else if constexpr (IsDataTypeDecimal && IsDataTypeNumber) vec_to[i] = convertFromDecimal(vec_from[i], vec_from.getScale()); else if constexpr (IsDataTypeNumber && IsDataTypeDecimal) vec_to[i] = convertToDecimal(vec_from[i], vec_to.getScale()); } else vec_to[i] = static_cast(vec_from[i]); } block.getByPosition(result).column = std::move(col_to); } else throw Exception("Illegal column " + named_from.column->getName() + " of first argument of function " + Name::name, ErrorCodes::ILLEGAL_COLUMN); } }; /** Conversion of Date to DateTime: adding 00:00:00 time component. */ struct ToDateTimeImpl { static constexpr auto name = "toDateTime"; static inline UInt32 execute(UInt16 d, const DateLUTImpl & time_zone) { return time_zone.fromDayNum(DayNum(d)); } }; template struct ConvertImpl : DateTimeTransformImpl {}; /// Implementation of toDate function. template struct ToDateTransform32Or64 { static constexpr auto name = "toDate"; static inline NO_SANITIZE_UNDEFINED ToType execute(const FromType & from, const DateLUTImpl & time_zone) { return (from < 0xFFFF) ? from : time_zone.toDayNum(from); } }; /** Conversion of DateTime to Date: throw off time component. */ template struct ConvertImpl : DateTimeTransformImpl {}; /** Special case of converting (U)Int32 or (U)Int64 (and also, for convenience, Float32, Float64) to Date. * If number is less than 65536, then it is treated as DayNum, and if greater or equals, then as unix timestamp. * It's a bit illogical, as we actually have two functions in one. * But allows to support frequent case, * when user write toDate(UInt32), expecting conversion of unix timestamp to Date. * (otherwise such usage would be frequent mistake). */ template struct ConvertImpl : DateTimeTransformImpl> {}; template struct ConvertImpl : DateTimeTransformImpl> {}; template struct ConvertImpl : DateTimeTransformImpl> {}; template struct ConvertImpl : DateTimeTransformImpl> {}; template struct ConvertImpl : DateTimeTransformImpl> {}; template struct ConvertImpl : DateTimeTransformImpl> {}; /** Transformation of numbers, dates, datetimes to strings: through formatting. */ template struct FormatImpl { static void execute(const typename DataType::FieldType x, WriteBuffer & wb, const DataType *, const DateLUTImpl *) { writeText(x, wb); } }; template <> struct FormatImpl { static void execute(const DataTypeDate::FieldType x, WriteBuffer & wb, const DataTypeDate *, const DateLUTImpl *) { writeDateText(DayNum(x), wb); } }; template <> struct FormatImpl { static void execute(const DataTypeDateTime::FieldType x, WriteBuffer & wb, const DataTypeDateTime *, const DateLUTImpl * time_zone) { writeDateTimeText(x, wb, *time_zone); } }; template struct FormatImpl> { static void execute(const FieldType x, WriteBuffer & wb, const DataTypeEnum * type, const DateLUTImpl *) { writeString(type->getNameForValue(x), wb); } }; template struct FormatImpl> { static void execute(const FieldType x, WriteBuffer & wb, const DataTypeDecimal * type, const DateLUTImpl *) { writeText(x, type->getScale(), wb); } }; /// DataTypeEnum to DataType free conversion template struct ConvertImpl, DataTypeNumber, Name> { static void execute(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/) { block.getByPosition(result).column = block.getByPosition(arguments[0]).column; } }; template struct ConvertImpl, DataTypeString>, Name> { using FromFieldType = typename FromDataType::FieldType; using ColVecType = std::conditional_t, ColumnDecimal, ColumnVector>; static void execute(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/) { const auto & col_with_type_and_name = block.getByPosition(arguments[0]); const auto & type = static_cast(*col_with_type_and_name.type); const DateLUTImpl * time_zone = nullptr; /// For argument of DateTime type, second argument with time zone could be specified. if constexpr (std::is_same_v) time_zone = &extractTimeZoneFromFunctionArguments(block, arguments, 1, 0); if (const auto col_from = checkAndGetColumn(col_with_type_and_name.column.get())) { auto col_to = ColumnString::create(); const typename ColVecType::Container & vec_from = col_from->getData(); ColumnString::Chars & data_to = col_to->getChars(); ColumnString::Offsets & offsets_to = col_to->getOffsets(); size_t size = vec_from.size(); if constexpr (std::is_same_v) data_to.resize(size * (strlen("YYYY-MM-DD") + 1)); else if constexpr (std::is_same_v) data_to.resize(size * (strlen("YYYY-MM-DD hh:mm:ss") + 1)); else data_to.resize(size * 3); /// Arbitary offsets_to.resize(size); WriteBufferFromVector write_buffer(data_to); for (size_t i = 0; i < size; ++i) { FormatImpl::execute(vec_from[i], write_buffer, &type, time_zone); writeChar(0, write_buffer); offsets_to[i] = write_buffer.count(); } write_buffer.finish(); block.getByPosition(result).column = std::move(col_to); } else throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + Name::name, ErrorCodes::ILLEGAL_COLUMN); } }; /// Generic conversion of any type to String. struct ConvertImplGenericToString { static void execute(Block & block, const ColumnNumbers & arguments, size_t result) { const auto & col_with_type_and_name = block.getByPosition(arguments[0]); const IDataType & type = *col_with_type_and_name.type; const IColumn & col_from = *col_with_type_and_name.column; size_t size = col_from.size(); auto col_to = ColumnString::create(); ColumnString::Chars & data_to = col_to->getChars(); ColumnString::Offsets & offsets_to = col_to->getOffsets(); data_to.resize(size * 2); /// Using coefficient 2 for initial size is arbitrary. offsets_to.resize(size); WriteBufferFromVector write_buffer(data_to); FormatSettings format_settings; for (size_t i = 0; i < size; ++i) { type.serializeAsText(col_from, i, write_buffer, format_settings); writeChar(0, write_buffer); offsets_to[i] = write_buffer.count(); } write_buffer.finish(); block.getByPosition(result).column = std::move(col_to); } }; /** Conversion of strings to numbers, dates, datetimes: through parsing. */ template void parseImpl(typename DataType::FieldType & x, ReadBuffer & rb, const DateLUTImpl *) { readText(x, rb); } template <> inline void parseImpl(DataTypeDate::FieldType & x, ReadBuffer & rb, const DateLUTImpl *) { DayNum tmp(0); readDateText(tmp, rb); x = tmp; } template <> inline void parseImpl(DataTypeDateTime::FieldType & x, ReadBuffer & rb, const DateLUTImpl * time_zone) { time_t tmp = 0; readDateTimeText(tmp, rb, *time_zone); x = tmp; } template <> inline void parseImpl(DataTypeUUID::FieldType & x, ReadBuffer & rb, const DateLUTImpl *) { UUID tmp; readText(tmp, rb); x = tmp; } template bool tryParseImpl(typename DataType::FieldType & x, ReadBuffer & rb, const DateLUTImpl *) { if constexpr (std::is_floating_point_v) return tryReadFloatText(x, rb); else /*if constexpr (std::is_integral_v)*/ return tryReadIntText(x, rb); } template <> inline bool tryParseImpl(DataTypeDate::FieldType & x, ReadBuffer & rb, const DateLUTImpl *) { DayNum tmp(0); if (!tryReadDateText(tmp, rb)) return false; x = tmp; return true; } template <> inline bool tryParseImpl(DataTypeDateTime::FieldType & x, ReadBuffer & rb, const DateLUTImpl * time_zone) { time_t tmp = 0; if (!tryReadDateTimeText(tmp, rb, *time_zone)) return false; x = tmp; return true; } /** Throw exception with verbose message when string value is not parsed completely. */ [[noreturn]] void throwExceptionForIncompletelyParsedValue(ReadBuffer & read_buffer, Block & block, size_t result); enum class ConvertFromStringExceptionMode { Throw, /// Throw exception if value cannot be parsed. Zero, /// Fill with zero or default if value cannot be parsed. Null /// Return ColumnNullable with NULLs when value cannot be parsed. }; enum class ConvertFromStringParsingMode { Normal, BestEffort /// Only applicable for DateTime. Will use sophisticated method, that is slower. }; template struct ConvertThroughParsing { static_assert(std::is_same_v || std::is_same_v, "ConvertThroughParsing is only applicable for String or FixedString data types"); using ToFieldType = typename ToDataType::FieldType; static bool isAllRead(ReadBuffer & in) { /// In case of FixedString, skip zero bytes at end. if constexpr (std::is_same_v) while (!in.eof() && *in.position() == 0) ++in.position(); if (in.eof()) return true; /// Special case, that allows to parse string with DateTime as Date. if (std::is_same_v && (in.buffer().size()) == strlen("YYYY-MM-DD hh:mm:ss")) return true; return false; } template static void execute(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count, Additions additions [[maybe_unused]] = Additions()) { using ColVecTo = std::conditional_t, ColumnDecimal, ColumnVector>; const DateLUTImpl * local_time_zone [[maybe_unused]] = nullptr; const DateLUTImpl * utc_time_zone [[maybe_unused]] = nullptr; /// For conversion to DateTime type, second argument with time zone could be specified. if constexpr (std::is_same_v) { local_time_zone = &extractTimeZoneFromFunctionArguments(block, arguments, 1, 0); if constexpr (parsing_mode == ConvertFromStringParsingMode::BestEffort) utc_time_zone = &DateLUT::instance("UTC"); } const IColumn * col_from = block.getByPosition(arguments[0]).column.get(); const ColumnString * col_from_string = checkAndGetColumn(col_from); const ColumnFixedString * col_from_fixed_string = checkAndGetColumn(col_from); if (std::is_same_v && !col_from_string) throw Exception("Illegal column " + col_from->getName() + " of first argument of function " + Name::name, ErrorCodes::ILLEGAL_COLUMN); if (std::is_same_v && !col_from_fixed_string) throw Exception("Illegal column " + col_from->getName() + " of first argument of function " + Name::name, ErrorCodes::ILLEGAL_COLUMN); size_t size = input_rows_count; typename ColVecTo::MutablePtr col_to = nullptr; if constexpr (IsDataTypeDecimal) { UInt32 scale = additions; ToDataType check_bounds_in_ctor(ToDataType::maxPrecision(), scale); col_to = ColVecTo::create(size, scale); } else col_to = ColVecTo::create(size); typename ColVecTo::Container & vec_to = col_to->getData(); ColumnUInt8::MutablePtr col_null_map_to; ColumnUInt8::Container * vec_null_map_to [[maybe_unused]] = nullptr; if constexpr (exception_mode == ConvertFromStringExceptionMode::Null) { col_null_map_to = ColumnUInt8::create(size); vec_null_map_to = &col_null_map_to->getData(); } const ColumnString::Chars * chars = nullptr; const IColumn::Offsets * offsets = nullptr; size_t fixed_string_size = 0; if constexpr (std::is_same_v) { chars = &col_from_string->getChars(); offsets = &col_from_string->getOffsets(); } else { chars = &col_from_fixed_string->getChars(); fixed_string_size = col_from_fixed_string->getN(); } size_t current_offset = 0; for (size_t i = 0; i < size; ++i) { size_t next_offset = std::is_same_v ? (*offsets)[i] : (current_offset + fixed_string_size); size_t string_size = std::is_same_v ? next_offset - current_offset - 1 : fixed_string_size; ReadBufferFromMemory read_buffer(&(*chars)[current_offset], string_size); if constexpr (exception_mode == ConvertFromStringExceptionMode::Throw) { if constexpr (parsing_mode == ConvertFromStringParsingMode::BestEffort) { time_t res; parseDateTimeBestEffort(res, read_buffer, *local_time_zone, *utc_time_zone); vec_to[i] = res; } else { if constexpr (IsDataTypeDecimal) ToDataType::readText(vec_to[i], read_buffer, ToDataType::maxPrecision(), vec_to.getScale()); else parseImpl(vec_to[i], read_buffer, local_time_zone); } if (!isAllRead(read_buffer)) throwExceptionForIncompletelyParsedValue(read_buffer, block, result); } else { bool parsed; if constexpr (parsing_mode == ConvertFromStringParsingMode::BestEffort) { time_t res; parsed = tryParseDateTimeBestEffort(res, read_buffer, *local_time_zone, *utc_time_zone); vec_to[i] = res; } else { if constexpr (IsDataTypeDecimal) parsed = ToDataType::tryReadText(vec_to[i], read_buffer, ToDataType::maxPrecision(), vec_to.getScale()); else parsed = tryParseImpl(vec_to[i], read_buffer, local_time_zone); parsed = parsed && isAllRead(read_buffer); } if (!parsed) vec_to[i] = 0; if constexpr (exception_mode == ConvertFromStringExceptionMode::Null) (*vec_null_map_to)[i] = !parsed; } current_offset = next_offset; } if constexpr (exception_mode == ConvertFromStringExceptionMode::Null) block.getByPosition(result).column = ColumnNullable::create(std::move(col_to), std::move(col_null_map_to)); else block.getByPosition(result).column = std::move(col_to); } }; template struct ConvertImpl, DataTypeString>, ToDataType, Name> : ConvertThroughParsing {}; template struct ConvertImpl, DataTypeFixedString>, ToDataType, Name> : ConvertThroughParsing {}; /// Generic conversion of any type from String. Used for complex types: Array and Tuple. struct ConvertImplGenericFromString { static void execute(Block & block, const ColumnNumbers & arguments, size_t result) { const IColumn & col_from = *block.getByPosition(arguments[0]).column; size_t size = col_from.size(); const IDataType & data_type_to = *block.getByPosition(result).type; if (const ColumnString * col_from_string = checkAndGetColumn(&col_from)) { auto res = data_type_to.createColumn(); IColumn & column_to = *res; column_to.reserve(size); const ColumnString::Chars & chars = col_from_string->getChars(); const IColumn::Offsets & offsets = col_from_string->getOffsets(); size_t current_offset = 0; FormatSettings format_settings; for (size_t i = 0; i < size; ++i) { ReadBufferFromMemory read_buffer(&chars[current_offset], offsets[i] - current_offset - 1); data_type_to.deserializeAsTextEscaped(column_to, read_buffer, format_settings); if (!read_buffer.eof()) throwExceptionForIncompletelyParsedValue(read_buffer, block, result); current_offset = offsets[i]; } block.getByPosition(result).column = std::move(res); } else throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of conversion function from string", ErrorCodes::ILLEGAL_COLUMN); } }; /// Function toUnixTimestamp has exactly the same implementation as toDateTime of String type. struct NameToUnixTimestamp { static constexpr auto name = "toUnixTimestamp"; }; template <> struct ConvertImpl : ConvertImpl {}; /** If types are identical, just take reference to column. */ template struct ConvertImpl, T, Name> { static void execute(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/) { block.getByPosition(result).column = block.getByPosition(arguments[0]).column; } }; /** Conversion from FixedString to String. * Cutting sequences of zero bytes from end of strings. */ template struct ConvertImpl { static void execute(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/) { if (const ColumnFixedString * col_from = checkAndGetColumn(block.getByPosition(arguments[0]).column.get())) { auto col_to = ColumnString::create(); const ColumnFixedString::Chars & data_from = col_from->getChars(); ColumnString::Chars & data_to = col_to->getChars(); ColumnString::Offsets & offsets_to = col_to->getOffsets(); size_t size = col_from->size(); size_t n = col_from->getN(); data_to.resize(size * (n + 1)); /// + 1 - zero terminator offsets_to.resize(size); size_t offset_from = 0; size_t offset_to = 0; for (size_t i = 0; i < size; ++i) { size_t bytes_to_copy = n; while (bytes_to_copy > 0 && data_from[offset_from + bytes_to_copy - 1] == 0) --bytes_to_copy; memcpy(&data_to[offset_to], &data_from[offset_from], bytes_to_copy); offset_from += n; offset_to += bytes_to_copy; data_to[offset_to] = 0; ++offset_to; offsets_to[i] = offset_to; } data_to.resize(offset_to); block.getByPosition(result).column = std::move(col_to); } else throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + Name::name, ErrorCodes::ILLEGAL_COLUMN); } }; /// Declared early because used below. struct NameToDate { static constexpr auto name = "toDate"; }; struct NameToDateTime { static constexpr auto name = "toDateTime"; }; struct NameToString { static constexpr auto name = "toString"; }; struct NameToDecimal32 { static constexpr auto name = "toDecimal32"; }; struct NameToDecimal64 { static constexpr auto name = "toDecimal64"; }; struct NameToDecimal128 { static constexpr auto name = "toDecimal128"; }; #define DEFINE_NAME_TO_INTERVAL(INTERVAL_KIND) \ struct NameToInterval ## INTERVAL_KIND \ { \ static constexpr auto name = "toInterval" #INTERVAL_KIND; \ static constexpr int kind = DataTypeInterval::INTERVAL_KIND; \ }; DEFINE_NAME_TO_INTERVAL(Second) DEFINE_NAME_TO_INTERVAL(Minute) DEFINE_NAME_TO_INTERVAL(Hour) DEFINE_NAME_TO_INTERVAL(Day) DEFINE_NAME_TO_INTERVAL(Week) DEFINE_NAME_TO_INTERVAL(Month) DEFINE_NAME_TO_INTERVAL(Quarter) DEFINE_NAME_TO_INTERVAL(Year) #undef DEFINE_NAME_TO_INTERVAL template class FunctionConvert : public IFunction { public: using Monotonic = MonotonicityImpl; static constexpr auto name = Name::name; static constexpr bool to_decimal = std::is_same_v || std::is_same_v || std::is_same_v; static FunctionPtr create(const Context &) { return std::make_shared(); } String getName() const override { return name; } bool isVariadic() const override { return true; } size_t getNumberOfArguments() const override { return 0; } bool isInjective(const Block &) override { return std::is_same_v; } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { if (to_decimal && arguments.size() != 2) { throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 2.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); } else if (arguments.size() != 1 && arguments.size() != 2) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1 or 2. Second argument (time zone) is optional only make sense for DateTime.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); if constexpr (std::is_same_v) { return std::make_shared(DataTypeInterval::Kind(Name::kind)); } else if constexpr (to_decimal) { if (!arguments[1].column) throw Exception("Second argument for function " + getName() + " must be constant", ErrorCodes::ILLEGAL_COLUMN); UInt64 scale = extractToDecimalScale(arguments[1]); if constexpr (std::is_same_v) return createDecimal(9, scale); else if constexpr (std::is_same_v) return createDecimal(18, scale); else if constexpr (std::is_same_v) return createDecimal(38, scale); throw Exception("Someting wrong with toDecimalNN()", ErrorCodes::LOGICAL_ERROR); } else { /** Optional second argument with time zone is supported: * - for functions toDateTime, toUnixTimestamp, toDate; * - for function toString of DateTime argument. */ if (arguments.size() == 2) { if (!checkAndGetDataType(arguments[1].type.get())) throw Exception("Illegal type " + arguments[1].type->getName() + " of 2nd argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); static constexpr bool to_date_or_time = std::is_same_v || std::is_same_v || std::is_same_v; if (!(to_date_or_time || (std::is_same_v && WhichDataType(arguments[0].type).isDateTime()))) { throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); } } if (std::is_same_v) return std::make_shared(extractTimeZoneNameFromFunctionArguments(arguments, 1, 0)); else return std::make_shared(); } } bool useDefaultImplementationForConstants() const override { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } bool canBeExecutedOnDefaultArguments() const override { return false; } void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override { try { executeInternal(block, arguments, result, input_rows_count); } catch (Exception & e) { /// More convenient error message. if (e.code() == ErrorCodes::ATTEMPT_TO_READ_AFTER_EOF) { e.addMessage("Cannot parse " + block.getByPosition(result).type->getName() + " from " + block.getByPosition(arguments[0]).type->getName() + ", because value is too short"); } else if (e.code() == ErrorCodes::CANNOT_PARSE_NUMBER || e.code() == ErrorCodes::CANNOT_READ_ARRAY_FROM_TEXT || e.code() == ErrorCodes::CANNOT_PARSE_INPUT_ASSERTION_FAILED || e.code() == ErrorCodes::CANNOT_PARSE_QUOTED_STRING || e.code() == ErrorCodes::CANNOT_PARSE_ESCAPE_SEQUENCE || e.code() == ErrorCodes::CANNOT_PARSE_DATE || e.code() == ErrorCodes::CANNOT_PARSE_DATETIME || e.code() == ErrorCodes::CANNOT_PARSE_UUID) { e.addMessage("Cannot parse " + block.getByPosition(result).type->getName() + " from " + block.getByPosition(arguments[0]).type->getName()); } throw; } } bool hasInformationAboutMonotonicity() const override { return Monotonic::has(); } Monotonicity getMonotonicityForRange(const IDataType & type, const Field & left, const Field & right) const override { return Monotonic::get(type, left, right); } private: void executeInternal(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) { if (!arguments.size()) throw Exception{"Function " + getName() + " expects at least 1 arguments", ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION}; const IDataType * from_type = block.getByPosition(arguments[0]).type.get(); auto call = [&](const auto & types) -> bool { using Types = std::decay_t; using LeftDataType = typename Types::LeftType; using RightDataType = typename Types::RightType; if constexpr (IsDataTypeDecimal) { if (arguments.size() != 2) throw Exception{"Function " + getName() + " expects 2 arguments for Decimal.", ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION}; const ColumnWithTypeAndName & scale_column = block.getByPosition(arguments[1]); UInt32 scale = extractToDecimalScale(scale_column); ConvertImpl::execute(block, arguments, result, input_rows_count, scale); } else ConvertImpl::execute(block, arguments, result, input_rows_count); return true; }; bool done = callOnIndexAndDataType(from_type->getTypeId(), call); if (!done) { /// Generic conversion of any type to String. if (std::is_same_v) { ConvertImplGenericToString::execute(block, arguments, result); } else throw Exception("Illegal type " + block.getByPosition(arguments[0]).type->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); } } }; /** Function toTOrZero (where T is number of date or datetime type): * try to convert from String to type T through parsing, * if cannot parse, return default value instead of throwing exception. * Function toTOrNull will return Nullable type with NULL when cannot parse. * NOTE Also need to implement tryToUnixTimestamp with timezone. */ template class FunctionConvertFromString : public IFunction { public: static constexpr auto name = Name::name; static constexpr bool to_decimal = std::is_same_v> || std::is_same_v> || std::is_same_v>; static FunctionPtr create(const Context &) { return std::make_shared(); } String getName() const override { return name; } bool isVariadic() const override { return true; } size_t getNumberOfArguments() const override { return 0; } bool useDefaultImplementationForConstants() const override { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { if ((arguments.size() != 1 && arguments.size() != 2) || (to_decimal && arguments.size() != 2)) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1 or 2. Second argument only make sense for DateTime (time zone, optional) and Decimal (scale).", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); if (!isStringOrFixedString(arguments[0].type)) throw Exception("Illegal type " + arguments[0].type->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); if (arguments.size() == 2) { if constexpr (std::is_same_v) { if (!isString(arguments[1].type)) throw Exception("Illegal type " + arguments[1].type->getName() + " of 2nd argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); } else if constexpr (to_decimal) { if (!isInteger(arguments[1].type)) throw Exception("Illegal type " + arguments[1].type->getName() + " of 2nd argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); if (!arguments[1].column) throw Exception("Second argument for function " + getName() + " must be constant", ErrorCodes::ILLEGAL_COLUMN); } else { throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1. Second argument makes sense only for DateTime and Decimal.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); } } DataTypePtr res; if constexpr (std::is_same_v) res = std::make_shared(extractTimeZoneNameFromFunctionArguments(arguments, 1, 0)); else if constexpr (to_decimal) { UInt64 scale = extractToDecimalScale(arguments[1]); if constexpr (std::is_same_v>) res = createDecimal(9, scale); else if constexpr (std::is_same_v>) res = createDecimal(18, scale); else if constexpr (std::is_same_v>) res = createDecimal(38, scale); if (!res) throw Exception("Someting wrong with toDecimalNNOrZero() or toDecimalNNOrNull()", ErrorCodes::LOGICAL_ERROR); } else res = std::make_shared(); if constexpr (exception_mode == ConvertFromStringExceptionMode::Null) res = std::make_shared(res); return res; } void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override { const IDataType * from_type = block.getByPosition(arguments[0]).type.get(); bool ok = true; if constexpr (to_decimal) { if (arguments.size() != 2) throw Exception{"Function " + getName() + " expects 2 arguments for Decimal.", ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION}; UInt32 scale = extractToDecimalScale(block.getByPosition(arguments[1])); if (checkAndGetDataType(from_type)) { ConvertThroughParsing::execute( block, arguments, result, input_rows_count, scale); } else if (checkAndGetDataType(from_type)) { ConvertThroughParsing::execute( block, arguments, result, input_rows_count, scale); } else ok = false; } else { if (checkAndGetDataType(from_type)) { ConvertThroughParsing::execute( block, arguments, result, input_rows_count); } else if (checkAndGetDataType(from_type)) { ConvertThroughParsing::execute( block, arguments, result, input_rows_count); } else ok = false; } if (!ok) throw Exception("Illegal type " + block.getByPosition(arguments[0]).type->getName() + " of argument of function " + getName() + ". Only String or FixedString argument is accepted for try-conversion function." + " For other arguments, use function without 'orZero' or 'orNull'.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); } }; /** Conversion to fixed string is implemented only for strings. */ class FunctionToFixedString : public IFunction { public: static constexpr auto name = "toFixedString"; static FunctionPtr create(const Context &) { return std::make_shared(); } String getName() const override { return name; } size_t getNumberOfArguments() const override { return 2; } bool isInjective(const Block &) override { return true; } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { if (!isUnsignedInteger(arguments[1].type)) throw Exception("Second argument for function " + getName() + " must be unsigned integer", ErrorCodes::ILLEGAL_COLUMN); if (!arguments[1].column) throw Exception("Second argument for function " + getName() + " must be constant", ErrorCodes::ILLEGAL_COLUMN); if (!isStringOrFixedString(arguments[0].type)) throw Exception(getName() + " is only implemented for types String and FixedString", ErrorCodes::NOT_IMPLEMENTED); const size_t n = arguments[1].column->getUInt(0); return std::make_shared(n); } bool useDefaultImplementationForConstants() const override { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/) override { const auto n = block.getByPosition(arguments[1]).column->getUInt(0); return executeForN(block, arguments, result, n); } static void executeForN(Block & block, const ColumnNumbers & arguments, const size_t result, const size_t n) { const auto & column = block.getByPosition(arguments[0]).column; if (const auto column_string = checkAndGetColumn(column.get())) { auto column_fixed = ColumnFixedString::create(n); auto & out_chars = column_fixed->getChars(); const auto & in_chars = column_string->getChars(); const auto & in_offsets = column_string->getOffsets(); out_chars.resize_fill(in_offsets.size() * n); for (size_t i = 0; i < in_offsets.size(); ++i) { const size_t off = i ? in_offsets[i - 1] : 0; const size_t len = in_offsets[i] - off - 1; if (len > n) throw Exception("String too long for type FixedString(" + toString(n) + ")", ErrorCodes::TOO_LARGE_STRING_SIZE); memcpy(&out_chars[i * n], &in_chars[off], len); } block.getByPosition(result).column = std::move(column_fixed); } else if (const auto column_fixed_string = checkAndGetColumn(column.get())) { const auto src_n = column_fixed_string->getN(); if (src_n > n) throw Exception{"String too long for type FixedString(" + toString(n) + ")", ErrorCodes::TOO_LARGE_STRING_SIZE}; auto column_fixed = ColumnFixedString::create(n); auto & out_chars = column_fixed->getChars(); const auto & in_chars = column_fixed_string->getChars(); const auto size = column_fixed_string->size(); out_chars.resize_fill(size * n); for (const auto i : ext::range(0, size)) memcpy(&out_chars[i * n], &in_chars[i * src_n], src_n); block.getByPosition(result).column = std::move(column_fixed); } else throw Exception("Unexpected column: " + column->getName(), ErrorCodes::ILLEGAL_COLUMN); } }; /// Monotonicity. struct PositiveMonotonicity { static bool has() { return true; } static IFunction::Monotonicity get(const IDataType &, const Field &, const Field &) { return { true }; } }; struct UnknownMonotonicity { static bool has() { return false; } static IFunction::Monotonicity get(const IDataType &, const Field &, const Field &) { return { false }; } }; template struct ToIntMonotonicity { static bool has() { return true; } static UInt64 divideByRangeOfType(UInt64 x) { if constexpr (sizeof(T) < sizeof(UInt64)) return x >> (sizeof(T) * 8); else return 0; } static IFunction::Monotonicity get(const IDataType & type, const Field & left, const Field & right) { if (!type.isValueRepresentedByNumber()) return {}; /// If type is same, the conversion is always monotonic. /// (Enum has separate case, because it is different data type) if (checkAndGetDataType>(&type) || checkAndGetDataType>(&type)) return { true, true, true }; /// Float cases. /// When converting to Float, the conversion is always monotonic. if (std::is_floating_point_v) return {true, true, true}; /// If converting from Float, for monotonicity, arguments must fit in range of result type. if (WhichDataType(type).isFloat()) { if (left.isNull() || right.isNull()) return {}; Float64 left_float = left.get(); Float64 right_float = right.get(); if (left_float >= std::numeric_limits::min() && left_float <= std::numeric_limits::max() && right_float >= std::numeric_limits::min() && right_float <= std::numeric_limits::max()) return { true }; return {}; } /// Integer cases. const bool from_is_unsigned = type.isValueRepresentedByUnsignedInteger(); const bool to_is_unsigned = std::is_unsigned_v; const size_t size_of_from = type.getSizeOfValueInMemory(); const size_t size_of_to = sizeof(T); const bool left_in_first_half = left.isNull() ? from_is_unsigned : (left.get() >= 0); const bool right_in_first_half = right.isNull() ? !from_is_unsigned : (right.get() >= 0); /// Size of type is the same. if (size_of_from == size_of_to) { if (from_is_unsigned == to_is_unsigned) return {true, true, true}; if (left_in_first_half == right_in_first_half) return {true}; return {}; } /// Size of type is expanded. if (size_of_from < size_of_to) { if (from_is_unsigned == to_is_unsigned) return {true, true, true}; if (!to_is_unsigned) return {true, true, true}; /// signed -> unsigned. If arguments from the same half, then function is monotonic. if (left_in_first_half == right_in_first_half) return {true}; return {}; } /// Size of type is shrinked. if (size_of_from > size_of_to) { /// Function cannot be monotonic on unbounded ranges. if (left.isNull() || right.isNull()) return {}; if (from_is_unsigned == to_is_unsigned) { /// all bits other than that fits, must be same. if (divideByRangeOfType(left.get()) == divideByRangeOfType(right.get())) return {true}; return {}; } else { /// When signedness is changed, it's also required for arguments to be from the same half. /// And they must be in the same half after converting to the result type. if (left_in_first_half == right_in_first_half && (T(left.get()) >= 0) == (T(right.get()) >= 0) && divideByRangeOfType(left.get()) == divideByRangeOfType(right.get())) return {true}; return {}; } } __builtin_unreachable(); } }; /** The monotonicity for the `toString` function is mainly determined for test purposes. * It is doubtful that anyone is looking to optimize queries with conditions `toString(CounterID) = 34`. */ struct ToStringMonotonicity { static bool has() { return true; } static IFunction::Monotonicity get(const IDataType & type, const Field & left, const Field & right) { IFunction::Monotonicity positive(true, true); IFunction::Monotonicity not_monotonic; /// `toString` function is monotonous if the argument is Date or DateTime, or non-negative numbers with the same number of symbols. if (checkAndGetDataType(&type) || typeid_cast(&type)) return positive; if (left.isNull() || right.isNull()) return {}; if (left.getType() == Field::Types::UInt64 && right.getType() == Field::Types::UInt64) { return (left.get() == 0 && right.get() == 0) || (floor(log10(left.get())) == floor(log10(right.get()))) ? positive : not_monotonic; } if (left.getType() == Field::Types::Int64 && right.getType() == Field::Types::Int64) { return (left.get() == 0 && right.get() == 0) || (left.get() > 0 && right.get() > 0 && floor(log10(left.get())) == floor(log10(right.get()))) ? positive : not_monotonic; } return not_monotonic; } }; struct NameToUInt8 { static constexpr auto name = "toUInt8"; }; struct NameToUInt16 { static constexpr auto name = "toUInt16"; }; struct NameToUInt32 { static constexpr auto name = "toUInt32"; }; struct NameToUInt64 { static constexpr auto name = "toUInt64"; }; struct NameToInt8 { static constexpr auto name = "toInt8"; }; struct NameToInt16 { static constexpr auto name = "toInt16"; }; struct NameToInt32 { static constexpr auto name = "toInt32"; }; struct NameToInt64 { static constexpr auto name = "toInt64"; }; struct NameToFloat32 { static constexpr auto name = "toFloat32"; }; struct NameToFloat64 { static constexpr auto name = "toFloat64"; }; struct NameToUUID { static constexpr auto name = "toUUID"; }; using FunctionToUInt8 = FunctionConvert>; using FunctionToUInt16 = FunctionConvert>; using FunctionToUInt32 = FunctionConvert>; using FunctionToUInt64 = FunctionConvert>; using FunctionToInt8 = FunctionConvert>; using FunctionToInt16 = FunctionConvert>; using FunctionToInt32 = FunctionConvert>; using FunctionToInt64 = FunctionConvert>; using FunctionToFloat32 = FunctionConvert; using FunctionToFloat64 = FunctionConvert; using FunctionToDate = FunctionConvert>; using FunctionToDateTime = FunctionConvert>; using FunctionToUUID = FunctionConvert>; using FunctionToString = FunctionConvert; using FunctionToUnixTimestamp = FunctionConvert>; using FunctionToDecimal32 = FunctionConvert, NameToDecimal32, UnknownMonotonicity>; using FunctionToDecimal64 = FunctionConvert, NameToDecimal64, UnknownMonotonicity>; using FunctionToDecimal128 = FunctionConvert, NameToDecimal128, UnknownMonotonicity>; template struct FunctionTo; template <> struct FunctionTo { using Type = FunctionToUInt8; }; template <> struct FunctionTo { using Type = FunctionToUInt16; }; template <> struct FunctionTo { using Type = FunctionToUInt32; }; template <> struct FunctionTo { using Type = FunctionToUInt64; }; template <> struct FunctionTo { using Type = FunctionToInt8; }; template <> struct FunctionTo { using Type = FunctionToInt16; }; template <> struct FunctionTo { using Type = FunctionToInt32; }; template <> struct FunctionTo { using Type = FunctionToInt64; }; template <> struct FunctionTo { using Type = FunctionToFloat32; }; template <> struct FunctionTo { using Type = FunctionToFloat64; }; template <> struct FunctionTo { using Type = FunctionToDate; }; template <> struct FunctionTo { using Type = FunctionToDateTime; }; template <> struct FunctionTo { using Type = FunctionToUUID; }; template <> struct FunctionTo { using Type = FunctionToString; }; template <> struct FunctionTo { using Type = FunctionToFixedString; }; template <> struct FunctionTo> { using Type = FunctionToDecimal32; }; template <> struct FunctionTo> { using Type = FunctionToDecimal64; }; template <> struct FunctionTo> { using Type = FunctionToDecimal128; }; template struct FunctionTo> : FunctionTo> { }; struct NameToUInt8OrZero { static constexpr auto name = "toUInt8OrZero"; }; struct NameToUInt16OrZero { static constexpr auto name = "toUInt16OrZero"; }; struct NameToUInt32OrZero { static constexpr auto name = "toUInt32OrZero"; }; struct NameToUInt64OrZero { static constexpr auto name = "toUInt64OrZero"; }; struct NameToInt8OrZero { static constexpr auto name = "toInt8OrZero"; }; struct NameToInt16OrZero { static constexpr auto name = "toInt16OrZero"; }; struct NameToInt32OrZero { static constexpr auto name = "toInt32OrZero"; }; struct NameToInt64OrZero { static constexpr auto name = "toInt64OrZero"; }; struct NameToFloat32OrZero { static constexpr auto name = "toFloat32OrZero"; }; struct NameToFloat64OrZero { static constexpr auto name = "toFloat64OrZero"; }; struct NameToDateOrZero { static constexpr auto name = "toDateOrZero"; }; struct NameToDateTimeOrZero { static constexpr auto name = "toDateTimeOrZero"; }; struct NameToDecimal32OrZero { static constexpr auto name = "toDecimal32OrZero"; }; struct NameToDecimal64OrZero { static constexpr auto name = "toDecimal64OrZero"; }; struct NameToDecimal128OrZero { static constexpr auto name = "toDecimal128OrZero"; }; using FunctionToUInt8OrZero = FunctionConvertFromString; using FunctionToUInt16OrZero = FunctionConvertFromString; using FunctionToUInt32OrZero = FunctionConvertFromString; using FunctionToUInt64OrZero = FunctionConvertFromString; using FunctionToInt8OrZero = FunctionConvertFromString; using FunctionToInt16OrZero = FunctionConvertFromString; using FunctionToInt32OrZero = FunctionConvertFromString; using FunctionToInt64OrZero = FunctionConvertFromString; using FunctionToFloat32OrZero = FunctionConvertFromString; using FunctionToFloat64OrZero = FunctionConvertFromString; using FunctionToDateOrZero = FunctionConvertFromString; using FunctionToDateTimeOrZero = FunctionConvertFromString; using FunctionToDecimal32OrZero = FunctionConvertFromString, NameToDecimal32OrZero, ConvertFromStringExceptionMode::Zero>; using FunctionToDecimal64OrZero = FunctionConvertFromString, NameToDecimal64OrZero, ConvertFromStringExceptionMode::Zero>; using FunctionToDecimal128OrZero = FunctionConvertFromString, NameToDecimal128OrZero, ConvertFromStringExceptionMode::Zero>; struct NameToUInt8OrNull { static constexpr auto name = "toUInt8OrNull"; }; struct NameToUInt16OrNull { static constexpr auto name = "toUInt16OrNull"; }; struct NameToUInt32OrNull { static constexpr auto name = "toUInt32OrNull"; }; struct NameToUInt64OrNull { static constexpr auto name = "toUInt64OrNull"; }; struct NameToInt8OrNull { static constexpr auto name = "toInt8OrNull"; }; struct NameToInt16OrNull { static constexpr auto name = "toInt16OrNull"; }; struct NameToInt32OrNull { static constexpr auto name = "toInt32OrNull"; }; struct NameToInt64OrNull { static constexpr auto name = "toInt64OrNull"; }; struct NameToFloat32OrNull { static constexpr auto name = "toFloat32OrNull"; }; struct NameToFloat64OrNull { static constexpr auto name = "toFloat64OrNull"; }; struct NameToDateOrNull { static constexpr auto name = "toDateOrNull"; }; struct NameToDateTimeOrNull { static constexpr auto name = "toDateTimeOrNull"; }; struct NameToDecimal32OrNull { static constexpr auto name = "toDecimal32OrNull"; }; struct NameToDecimal64OrNull { static constexpr auto name = "toDecimal64OrNull"; }; struct NameToDecimal128OrNull { static constexpr auto name = "toDecimal128OrNull"; }; using FunctionToUInt8OrNull = FunctionConvertFromString; using FunctionToUInt16OrNull = FunctionConvertFromString; using FunctionToUInt32OrNull = FunctionConvertFromString; using FunctionToUInt64OrNull = FunctionConvertFromString; using FunctionToInt8OrNull = FunctionConvertFromString; using FunctionToInt16OrNull = FunctionConvertFromString; using FunctionToInt32OrNull = FunctionConvertFromString; using FunctionToInt64OrNull = FunctionConvertFromString; using FunctionToFloat32OrNull = FunctionConvertFromString; using FunctionToFloat64OrNull = FunctionConvertFromString; using FunctionToDateOrNull = FunctionConvertFromString; using FunctionToDateTimeOrNull = FunctionConvertFromString; using FunctionToDecimal32OrNull = FunctionConvertFromString, NameToDecimal32OrNull, ConvertFromStringExceptionMode::Null>; using FunctionToDecimal64OrNull = FunctionConvertFromString, NameToDecimal64OrNull, ConvertFromStringExceptionMode::Null>; using FunctionToDecimal128OrNull = FunctionConvertFromString, NameToDecimal128OrNull, ConvertFromStringExceptionMode::Null>; struct NameParseDateTimeBestEffort { static constexpr auto name = "parseDateTimeBestEffort"; }; struct NameParseDateTimeBestEffortOrZero { static constexpr auto name = "parseDateTimeBestEffortOrZero"; }; struct NameParseDateTimeBestEffortOrNull { static constexpr auto name = "parseDateTimeBestEffortOrNull"; }; using FunctionParseDateTimeBestEffort = FunctionConvertFromString< DataTypeDateTime, NameParseDateTimeBestEffort, ConvertFromStringExceptionMode::Throw, ConvertFromStringParsingMode::BestEffort>; using FunctionParseDateTimeBestEffortOrZero = FunctionConvertFromString< DataTypeDateTime, NameParseDateTimeBestEffortOrZero, ConvertFromStringExceptionMode::Zero, ConvertFromStringParsingMode::BestEffort>; using FunctionParseDateTimeBestEffortOrNull = FunctionConvertFromString< DataTypeDateTime, NameParseDateTimeBestEffortOrNull, ConvertFromStringExceptionMode::Null, ConvertFromStringParsingMode::BestEffort>; class PreparedFunctionCast : public PreparedFunctionImpl { public: using WrapperType = std::function; explicit PreparedFunctionCast(WrapperType && wrapper_function, const char * name) : wrapper_function(std::move(wrapper_function)), name(name) {} String getName() const override { return name; } protected: void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override { /// drop second argument, pass others ColumnNumbers new_arguments{arguments.front()}; if (arguments.size() > 2) new_arguments.insert(std::end(new_arguments), std::next(std::begin(arguments), 2), std::end(arguments)); wrapper_function(block, new_arguments, result, input_rows_count); } bool useDefaultImplementationForNulls() const override { return false; } bool useDefaultImplementationForConstants() const override { return true; } bool useDefaultImplementationForLowCardinalityColumns() const override { return false; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } private: WrapperType wrapper_function; const char * name; }; struct NameCast { static constexpr auto name = "CAST"; }; class FunctionCast final : public IFunctionBase { public: using WrapperType = std::function; using MonotonicityForRange = std::function; FunctionCast(const Context & context, const char * name, MonotonicityForRange && monotonicity_for_range , const DataTypes & argument_types, const DataTypePtr & return_type) : context(context), name(name), monotonicity_for_range(monotonicity_for_range) , argument_types(argument_types), return_type(return_type) { } const DataTypes & getArgumentTypes() const override { return argument_types; } const DataTypePtr & getReturnType() const override { return return_type; } PreparedFunctionPtr prepare(const Block & /*sample_block*/, const ColumnNumbers & /*arguments*/, size_t /*result*/) const override { return std::make_shared( prepareUnpackDictionaries(getArgumentTypes()[0], getReturnType()), name); } String getName() const override { return name; } bool hasInformationAboutMonotonicity() const override { return static_cast(monotonicity_for_range); } Monotonicity getMonotonicityForRange(const IDataType & type, const Field & left, const Field & right) const override { return monotonicity_for_range(type, left, right); } private: const Context & context; const char * name; MonotonicityForRange monotonicity_for_range; DataTypes argument_types; DataTypePtr return_type; template WrapperType createWrapper(const DataTypePtr & from_type, const DataType * const, bool requested_result_is_nullable) const { FunctionPtr function; if (requested_result_is_nullable && checkAndGetDataType(from_type.get())) { /// In case when converting to Nullable type, we apply different parsing rule, /// that will not throw an exception but return NULL in case of malformed input. function = FunctionConvertFromString::create(context); } else function = FunctionTo::Type::create(context); /// Check conversion using underlying function { function->getReturnType(ColumnsWithTypeAndName(1, { nullptr, from_type, "" })); } return [function] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { function->execute(block, arguments, result, input_rows_count); }; } WrapperType createStringWrapper(const DataTypePtr & from_type) const { FunctionPtr function = FunctionToString::create(context); /// Check conversion using underlying function { function->getReturnType(ColumnsWithTypeAndName(1, { nullptr, from_type, "" })); } return [function] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { function->execute(block, arguments, result, input_rows_count); }; } static WrapperType createFixedStringWrapper(const DataTypePtr & from_type, const size_t N) { if (!isStringOrFixedString(from_type)) throw Exception{"CAST AS FixedString is only implemented for types String and FixedString", ErrorCodes::NOT_IMPLEMENTED}; return [N] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t /*input_rows_count*/) { FunctionToFixedString::executeForN(block, arguments, result, N); }; } WrapperType createUUIDWrapper(const DataTypePtr & from_type, const DataTypeUUID * const, bool requested_result_is_nullable) const { if (requested_result_is_nullable) throw Exception{"CAST AS Nullable(UUID) is not implemented", ErrorCodes::NOT_IMPLEMENTED}; FunctionPtr function = FunctionTo::Type::create(context); /// Check conversion using underlying function { function->getReturnType(ColumnsWithTypeAndName(1, { nullptr, from_type, "" })); } return [function] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { function->execute(block, arguments, result, input_rows_count); }; } template WrapperType createDecimalWrapper(const DataTypePtr & from_type, const DataTypeDecimal * to_type) const { using ToDataType = DataTypeDecimal; TypeIndex type_index = from_type->getTypeId(); UInt32 scale = to_type->getScale(); return [type_index, scale] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { callOnIndexAndDataType(type_index, [&](const auto & types) -> bool { using Types = std::decay_t; using LeftDataType = typename Types::LeftType; using RightDataType = typename Types::RightType; ConvertImpl::execute(block, arguments, result, input_rows_count, scale); return true; }); }; } WrapperType createArrayWrapper(const DataTypePtr & from_type_untyped, const DataTypeArray * to_type) const { /// Conversion from String through parsing. if (checkAndGetDataType(from_type_untyped.get())) { return [] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t /*input_rows_count*/) { ConvertImplGenericFromString::execute(block, arguments, result); }; } DataTypePtr from_nested_type; DataTypePtr to_nested_type; auto from_type = checkAndGetDataType(from_type_untyped.get()); /// get the most nested type if (from_type && to_type) { from_nested_type = from_type->getNestedType(); to_nested_type = to_type->getNestedType(); from_type = checkAndGetDataType(from_nested_type.get()); to_type = checkAndGetDataType(to_nested_type.get()); } /// both from_type and to_type should be nullptr now is array types had same dimensions if ((from_type == nullptr) != (to_type == nullptr)) throw Exception{"CAST AS Array can only be performed between same-dimensional array types or from String", ErrorCodes::TYPE_MISMATCH}; /// Prepare nested type conversion const auto nested_function = prepareUnpackDictionaries(from_nested_type, to_nested_type); return [nested_function, from_nested_type, to_nested_type]( Block & block, const ColumnNumbers & arguments, const size_t result, size_t /*input_rows_count*/) { const auto & array_arg = block.getByPosition(arguments.front()); if (const ColumnArray * col_array = checkAndGetColumn(array_arg.column.get())) { /// create block for converting nested column containing original and result columns Block nested_block { { col_array->getDataPtr(), from_nested_type, "" }, { nullptr, to_nested_type, "" } }; /// convert nested column nested_function(nested_block, {0}, 1, nested_block.rows()); /// set converted nested column to result block.getByPosition(result).column = ColumnArray::create(nested_block.getByPosition(1).column, col_array->getOffsetsPtr()); } else throw Exception{"Illegal column " + array_arg.column->getName() + " for function CAST AS Array", ErrorCodes::LOGICAL_ERROR}; }; } WrapperType createTupleWrapper(const DataTypePtr & from_type_untyped, const DataTypeTuple * to_type) const { /// Conversion from String through parsing. if (checkAndGetDataType(from_type_untyped.get())) { return [] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t /*input_rows_count*/) { ConvertImplGenericFromString::execute(block, arguments, result); }; } const auto from_type = checkAndGetDataType(from_type_untyped.get()); if (!from_type) throw Exception{"CAST AS Tuple can only be performed between tuple types or from String.\nLeft type: " + from_type_untyped->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH}; if (from_type->getElements().size() != to_type->getElements().size()) throw Exception{"CAST AS Tuple can only be performed between tuple types with the same number of elements or from String.\n" "Left type: " + from_type->getName() + ", right type: " + to_type->getName(), ErrorCodes::TYPE_MISMATCH}; const auto & from_element_types = from_type->getElements(); const auto & to_element_types = to_type->getElements(); std::vector element_wrappers; element_wrappers.reserve(from_element_types.size()); /// Create conversion wrapper for each element in tuple for (const auto idx_type : ext::enumerate(from_type->getElements())) element_wrappers.push_back(prepareUnpackDictionaries(idx_type.second, to_element_types[idx_type.first])); return [element_wrappers, from_element_types, to_element_types] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { const auto col = block.getByPosition(arguments.front()).column.get(); /// copy tuple elements to a separate block Block element_block; size_t tuple_size = from_element_types.size(); const ColumnTuple & column_tuple = typeid_cast(*col); /// create columns for source elements for (size_t i = 0; i < tuple_size; ++i) element_block.insert({ column_tuple.getColumns()[i], from_element_types[i], "" }); /// create columns for converted elements for (const auto & to_element_type : to_element_types) element_block.insert({ nullptr, to_element_type, "" }); /// insert column for converted tuple element_block.insert({ nullptr, std::make_shared(to_element_types), "" }); /// invoke conversion for each element for (const auto idx_element_wrapper : ext::enumerate(element_wrappers)) idx_element_wrapper.second(element_block, { idx_element_wrapper.first }, tuple_size + idx_element_wrapper.first, input_rows_count); Columns converted_columns(tuple_size); for (size_t i = 0; i < tuple_size; ++i) converted_columns[i] = element_block.getByPosition(tuple_size + i).column; block.getByPosition(result).column = ColumnTuple::create(converted_columns); }; } template WrapperType createEnumWrapper(const DataTypePtr & from_type, const DataTypeEnum * to_type) const { using EnumType = DataTypeEnum; using Function = typename FunctionTo::Type; if (const auto from_enum8 = checkAndGetDataType(from_type.get())) checkEnumToEnumConversion(from_enum8, to_type); else if (const auto from_enum16 = checkAndGetDataType(from_type.get())) checkEnumToEnumConversion(from_enum16, to_type); if (checkAndGetDataType(from_type.get())) return createStringToEnumWrapper(); else if (checkAndGetDataType(from_type.get())) return createStringToEnumWrapper(); else if (isNativeNumber(from_type) || isEnum(from_type)) { auto function = Function::create(context); /// Check conversion using underlying function { function->getReturnType(ColumnsWithTypeAndName(1, { nullptr, from_type, "" })); } return [function] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { function->execute(block, arguments, result, input_rows_count); }; } else throw Exception{"Conversion from " + from_type->getName() + " to " + to_type->getName() + " is not supported", ErrorCodes::CANNOT_CONVERT_TYPE}; } template void checkEnumToEnumConversion(const EnumTypeFrom * from_type, const EnumTypeTo * to_type) const { const auto & from_values = from_type->getValues(); const auto & to_values = to_type->getValues(); using ValueType = std::common_type_t; using NameValuePair = std::pair; using EnumValues = std::vector; EnumValues name_intersection; std::set_intersection(std::begin(from_values), std::end(from_values), std::begin(to_values), std::end(to_values), std::back_inserter(name_intersection), [] (auto && from, auto && to) { return from.first < to.first; }); for (const auto & name_value : name_intersection) { const auto & old_value = name_value.second; const auto & new_value = to_type->getValue(name_value.first); if (old_value != new_value) throw Exception{"Enum conversion changes value for element '" + name_value.first + "' from " + toString(old_value) + " to " + toString(new_value), ErrorCodes::CANNOT_CONVERT_TYPE}; } } template WrapperType createStringToEnumWrapper() const { const char * function_name = name; return [function_name] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t /*input_rows_count*/) { const auto first_col = block.getByPosition(arguments.front()).column.get(); auto & col_with_type_and_name = block.getByPosition(result); const auto & result_type = typeid_cast(*col_with_type_and_name.type); if (const auto col = typeid_cast(first_col)) { const auto size = col->size(); auto res = result_type.createColumn(); auto & out_data = static_cast(*res).getData(); out_data.resize(size); for (const auto i : ext::range(0, size)) out_data[i] = result_type.getValue(col->getDataAt(i)); col_with_type_and_name.column = std::move(res); } else throw Exception{"Unexpected column " + first_col->getName() + " as first argument of function " + function_name, ErrorCodes::LOGICAL_ERROR}; }; } WrapperType createIdentityWrapper(const DataTypePtr &) const { return [] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t /*input_rows_count*/) { block.getByPosition(result).column = block.getByPosition(arguments.front()).column; }; } WrapperType createNothingWrapper(const IDataType * to_type) const { ColumnPtr res = to_type->createColumnConstWithDefaultValue(1); return [res] (Block & block, const ColumnNumbers &, const size_t result, size_t input_rows_count) { /// Column of Nothing type is trivially convertible to any other column block.getByPosition(result).column = res->cloneResized(input_rows_count)->convertToFullColumnIfConst(); }; } WrapperType prepareUnpackDictionaries(const DataTypePtr & from_type, const DataTypePtr & to_type) const { const auto * from_low_cardinality = typeid_cast(from_type.get()); const auto * to_low_cardinality = typeid_cast(to_type.get()); const auto & from_nested = from_low_cardinality ? from_low_cardinality->getDictionaryType() : from_type; const auto & to_nested = to_low_cardinality ? to_low_cardinality->getDictionaryType() : to_type; if (from_type->onlyNull()) { if (!to_nested->isNullable()) throw Exception{"Cannot convert NULL to a non-nullable type", ErrorCodes::CANNOT_CONVERT_TYPE}; return [](Block & block, const ColumnNumbers &, const size_t result, size_t input_rows_count) { auto & res = block.getByPosition(result); res.column = res.type->createColumnConstWithDefaultValue(input_rows_count)->convertToFullColumnIfConst(); }; } auto wrapper = prepareRemoveNullable(from_nested, to_nested); if (!from_low_cardinality && !to_low_cardinality) return wrapper; return [wrapper, from_low_cardinality, to_low_cardinality] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { auto & arg = block.getByPosition(arguments[0]); auto & res = block.getByPosition(result); ColumnPtr res_indexes; /// For some types default can't be casted (for example, String to Int). In that case convert column to full. bool src_converted_to_full_column = false; { /// Replace argument and result columns (and types) to dictionary key columns (and types). /// Call nested wrapper in order to cast dictionary keys. Then restore block. auto prev_arg_col = arg.column; auto prev_arg_type = arg.type; auto prev_res_type = res.type; auto tmp_rows_count = input_rows_count; if (to_low_cardinality) res.type = to_low_cardinality->getDictionaryType(); if (from_low_cardinality) { auto * col_low_cardinality = typeid_cast(prev_arg_col.get()); arg.column = col_low_cardinality->getDictionary().getNestedColumn(); arg.type = from_low_cardinality->getDictionaryType(); /// TODO: Make map with defaults conversion. src_converted_to_full_column = !removeNullable(arg.type)->equals(*removeNullable(res.type)); if (src_converted_to_full_column) arg.column = arg.column->index(col_low_cardinality->getIndexes(), 0); else res_indexes = col_low_cardinality->getIndexesPtr(); tmp_rows_count = arg.column->size(); } /// Perform the requested conversion. wrapper(block, arguments, result, tmp_rows_count); arg.column = prev_arg_col; arg.type = prev_arg_type; res.type = prev_res_type; } if (to_low_cardinality) { auto res_column = to_low_cardinality->createColumn(); auto * col_low_cardinality = typeid_cast(res_column.get()); if (from_low_cardinality && !src_converted_to_full_column) { auto res_keys = std::move(res.column); col_low_cardinality->insertRangeFromDictionaryEncodedColumn(*res_keys, *res_indexes); } else col_low_cardinality->insertRangeFromFullColumn(*res.column, 0, res.column->size()); res.column = std::move(res_column); } else if (!src_converted_to_full_column) res.column = res.column->index(*res_indexes, 0); }; } WrapperType prepareRemoveNullable(const DataTypePtr & from_type, const DataTypePtr & to_type) const { /// Determine whether pre-processing and/or post-processing must take place during conversion. bool source_is_nullable = from_type->isNullable(); bool result_is_nullable = to_type->isNullable(); auto wrapper = prepareImpl(removeNullable(from_type), removeNullable(to_type), result_is_nullable); if (result_is_nullable) { return [wrapper, source_is_nullable] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { /// Create a temporary block on which to perform the operation. auto & res = block.getByPosition(result); const auto & ret_type = res.type; const auto & nullable_type = static_cast(*ret_type); const auto & nested_type = nullable_type.getNestedType(); Block tmp_block; if (source_is_nullable) tmp_block = createBlockWithNestedColumns(block, arguments); else tmp_block = block; size_t tmp_res_index = block.columns(); tmp_block.insert({nullptr, nested_type, ""}); /// Perform the requested conversion. wrapper(tmp_block, arguments, tmp_res_index, input_rows_count); const auto & tmp_res = tmp_block.getByPosition(tmp_res_index); res.column = wrapInNullable(tmp_res.column, Block({block.getByPosition(arguments[0]), tmp_res}), {0}, 1, input_rows_count); }; } else if (source_is_nullable) { /// Conversion from Nullable to non-Nullable. return [wrapper] (Block & block, const ColumnNumbers & arguments, const size_t result, size_t input_rows_count) { Block tmp_block = createBlockWithNestedColumns(block, arguments, result); /// Check that all values are not-NULL. const auto & col = block.getByPosition(arguments[0]).column; const auto & nullable_col = static_cast(*col); const auto & null_map = nullable_col.getNullMapData(); if (!memoryIsZero(null_map.data(), null_map.size())) throw Exception{"Cannot convert NULL value to non-Nullable type", ErrorCodes::CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN}; wrapper(tmp_block, arguments, result, input_rows_count); block.getByPosition(result).column = tmp_block.getByPosition(result).column; }; } else return wrapper; } /// 'from_type' and 'to_type' are nested types in case of Nullable. /// 'requested_result_is_nullable' is true if CAST to Nullable type is requested. WrapperType prepareImpl(const DataTypePtr & from_type, const DataTypePtr & to_type, bool requested_result_is_nullable) const { if (from_type->equals(*to_type)) return createIdentityWrapper(from_type); else if (WhichDataType(from_type).isNothing()) return createNothingWrapper(to_type.get()); WrapperType ret; auto make_default_wrapper = [&](const auto & types) -> bool { using Types = std::decay_t; using ToDataType = typename Types::LeftType; if constexpr ( std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v || std::is_same_v) { ret = createWrapper(from_type, checkAndGetDataType(to_type.get()), requested_result_is_nullable); return true; } if constexpr ( std::is_same_v || std::is_same_v) { ret = createEnumWrapper(from_type, checkAndGetDataType(to_type.get())); return true; } if constexpr ( std::is_same_v> || std::is_same_v> || std::is_same_v>) { ret = createDecimalWrapper(from_type, checkAndGetDataType(to_type.get())); return true; } if constexpr (std::is_same_v) { if (isStringOrFixedString(from_type)) { ret = createUUIDWrapper(from_type, checkAndGetDataType(to_type.get()), requested_result_is_nullable); return true; } } return false; }; if (callOnIndexAndDataType(to_type->getTypeId(), make_default_wrapper)) return ret; switch (to_type->getTypeId()) { case TypeIndex::String: return createStringWrapper(from_type); case TypeIndex::FixedString: return createFixedStringWrapper(from_type, checkAndGetDataType(to_type.get())->getN()); case TypeIndex::Array: return createArrayWrapper(from_type, checkAndGetDataType(to_type.get())); case TypeIndex::Tuple: return createTupleWrapper(from_type, checkAndGetDataType(to_type.get())); default: break; } /// It's possible to use ConvertImplGenericFromString to convert from String to AggregateFunction, /// but it is disabled because deserializing aggregate functions state might be unsafe. throw Exception{"Conversion from " + from_type->getName() + " to " + to_type->getName() + " is not supported", ErrorCodes::CANNOT_CONVERT_TYPE}; } }; class FunctionBuilderCast : public FunctionBuilderImpl { public: using MonotonicityForRange = FunctionCast::MonotonicityForRange; static constexpr auto name = "CAST"; static FunctionBuilderPtr create(const Context & context) { return std::make_shared(context); } FunctionBuilderCast(const Context & context) : context(context) {} String getName() const override { return name; } size_t getNumberOfArguments() const override { return 2; } protected: FunctionBasePtr buildImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & return_type) const override { DataTypes data_types(arguments.size()); for (size_t i = 0; i < arguments.size(); ++i) data_types[i] = arguments[i].type; auto monotonicity = getMonotonicityInformation(arguments.front().type, return_type.get()); return std::make_shared(context, name, std::move(monotonicity), data_types, return_type); } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { const auto type_col = checkAndGetColumnConst(arguments.back().column.get()); if (!type_col) throw Exception("Second argument to " + getName() + " must be a constant string describing type", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return DataTypeFactory::instance().get(type_col->getValue()); } bool useDefaultImplementationForNulls() const override { return false; } bool useDefaultImplementationForLowCardinalityColumns() const override { return false; } private: template static auto monotonicityForType(const DataType * const) { return FunctionTo::Type::Monotonic::get; } MonotonicityForRange getMonotonicityInformation(const DataTypePtr & from_type, const IDataType * to_type) const { if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (isEnum(from_type)) { if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); if (const auto type = checkAndGetDataType(to_type)) return monotonicityForType(type); } /// other types like Null, FixedString, Array and Tuple have no monotonicity defined return {}; } const Context & context; }; }