#pragma once #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include namespace DB { /** Функции по работе с массивами: * * array(с1, с2, ...) - создать массив из констант. * arrayElement(arr, i) - получить элемент массива по индексу. * Индекс начинается с 1. Также индекс может быть отрицательным - тогда он считается с конца массива. * has(arr, x) - есть ли в массиве элемент x. * indexOf(arr, x) - возвращает индекс элемента x (начиная с 1), если он есть в массиве, или 0, если его нет. * arrayEnumerate(arr) - возаращает массив [1,2,3,..., length(arr)] * * arrayUniq(arr) - считает количество разных элементов в массиве, * arrayUniq(arr1, arr2, ...) - считает количество разных кортежей из элементов на соответствующих позициях в нескольких массивах. * * arrayEnumerateUniq(arr) * - возаращает массив, параллельный данному, где для каждого элемента указано, * какой он по счету среди элементов с таким значением. * Например: arrayEnumerateUniq([10, 20, 10, 30]) = [1, 1, 2, 1] * arrayEnumerateUniq(arr1, arr2...) * - для кортежей из элементов на соответствующих позициях в нескольких массивах. * * emptyArrayToSingle(arr) - заменить пустые массивы на массивы из одного элемента со значением "по-умолчанию". */ class FunctionArray : public IFunction { public: static constexpr auto name = "array"; static IFunction * create(const Context & context) { return new FunctionArray; } private: /// Получить имя функции. String getName() const override { return name; } template bool checkRightType(DataTypePtr left, DataTypePtr right, DataTypePtr & type_res) const { if (typeid_cast(&*right)) { typedef typename NumberTraits::ResultOfIf::Type ResultType; type_res = DataTypeFromFieldTypeOrError::getDataType(); if (!type_res) throw Exception("Arguments of function " + getName() + " are not upscalable to a common type without loss of precision.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return true; } return false; } template bool checkLeftType(DataTypePtr left, DataTypePtr right, DataTypePtr & type_res) const { if (typeid_cast(&*left)) { if ( checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res) || checkRightType(left, right, type_res)) return true; else throw Exception("Illegal type " + right->getName() + " as argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); } return false; } template bool tryAddField(DataTypePtr type_res, const Field & f, Array & arr) const { if (typeid_cast(&*type_res)) { arr.push_back(apply_visitor(FieldVisitorConvertToNumber(), f)); return true; } return false; } bool addField(DataTypePtr type_res, const Field & f, Array & arr) const { /// Иначе необходимо if ( tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) || tryAddField(type_res, f, arr) ) return true; else throw Exception("Illegal result type " + type_res->getName() + " of function " + getName(), ErrorCodes::LOGICAL_ERROR); } DataTypePtr getLeastCommonType(DataTypePtr left, DataTypePtr right) const { DataTypePtr type_res; if (!( checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res) || checkLeftType(left, right, type_res))) throw Exception("Internal error: unexpected type " + left->getName() + " as argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return type_res; } static const DataTypePtr & getScalarType(const DataTypePtr & type) { const auto array = typeid_cast(type.get()); if (!array) return type; return getScalarType(array->getNestedType()); } public: /// Получить тип результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.empty()) throw Exception("Function array requires at least one argument.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); DataTypePtr result_type = arguments[0]; if (result_type->behavesAsNumber()) { /// Если тип числовой, пробуем выделить наименьший общий тип for (size_t i = 1, size = arguments.size(); i < size; ++i) result_type = getLeastCommonType(result_type, arguments[i]); } else { /// Иначе все аргументы должны быть одинаковыми for (size_t i = 1, size = arguments.size(); i < size; ++i) if (arguments[i]->getName() != arguments[0]->getName()) throw Exception("Arguments for function array must have same type or behave as number.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); } return new DataTypeArray(result_type); } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { const auto is_const = [&] { for (const auto arg_num : arguments) if (!block.getByPosition(arg_num).column->isConst()) return false; return true; }(); const auto first_arg = block.getByPosition(arguments[0]); DataTypePtr result_type = first_arg.type; if (result_type->behavesAsNumber()) { /// Если тип числовой, вычисляем наименьший общий тип for (size_t i = 1, size = arguments.size(); i < size; ++i) result_type = getLeastCommonType(result_type, block.getByPosition(arguments[i]).type); } if (is_const) { Array arr; for (const auto arg_num : arguments) if (block.getByPosition(arg_num).type->getName() == result_type->getName()) /// Если элемент такого же типа как результат, просто добавляем его в ответ arr.push_back((*block.getByPosition(arg_num).column)[0]); else /// Иначе необходимо привести его к типу результата addField(result_type, (*block.getByPosition(arg_num).column)[0], arr); block.getByPosition(result).column = new ColumnConstArray{ first_arg.column->size(), arr, new DataTypeArray{result_type} }; } else { auto out = new ColumnArray{result_type->createColumn()}; ColumnPtr out_ptr{out}; for (const auto row_num : ext::range(0, first_arg.column->size())) { Array arr; for (const auto arg_num : arguments) if (block.getByPosition(arg_num).type->getName() == result_type->getName()) /// Если элемент такого же типа как результат, просто добавляем его в ответ arr.push_back((*block.getByPosition(arg_num).column)[row_num]); else /// Иначе необходимо привести его к типу результата addField(result_type, (*block.getByPosition(arg_num).column)[row_num], arr); out->insert(arr); } block.getByPosition(result).column = out_ptr; } } }; template struct ArrayElementNumImpl { /** Процедура для константного идекса * Если negative = false - передаётся индекс с начала массива, начиная с нуля. * Если negative = true - передаётся индекс с конца массива, начиная с нуля. */ template static void vectorConst( const PODArray & data, const ColumnArray::Offsets_t & offsets, const ColumnArray::Offset_t index, PODArray & result) { size_t size = offsets.size(); result.resize(size); ColumnArray::Offset_t current_offset = 0; for (size_t i = 0; i < size; ++i) { size_t array_size = offsets[i] - current_offset; if (index < array_size) result[i] = !negative ? data[current_offset + index] : data[offsets[i] - index - 1]; else result[i] = T(); current_offset = offsets[i]; } } /** Процедура для неконстантного идекса * index_type - тип данных идекса */ template static void vector( const PODArray & data, const ColumnArray::Offsets_t & offsets, const ColumnVector & index, PODArray & result) { size_t size = offsets.size(); result.resize(size); ColumnArray::Offset_t current_offset = 0; for (size_t i = 0; i < size; ++i) { size_t array_size = offsets[i] - current_offset; if (index[i].getType() == Field::Types::UInt64) { UInt64 cur_id = safeGet(index[i]); if (cur_id > 0 && cur_id <= array_size) result[i] = data[current_offset + cur_id - 1]; else result[i] = T(); } else if (index[i].getType() == Field::Types::Int64) { Int64 cur_id = safeGet(index[i]); if (cur_id > 0 && static_cast(cur_id) <= array_size) result[i] = data[current_offset + cur_id - 1]; else if (cur_id < 0 && static_cast(-cur_id) <= array_size) result[i] = data[offsets[i] + cur_id]; else result[i] = T(); } else throw Exception("Illegal type of array index", ErrorCodes::LOGICAL_ERROR); current_offset = offsets[i]; } } }; struct ArrayElementStringImpl { /** Процедура для константного идекса * Если negative = false - передаётся индекс с начала массива, начиная с нуля. * Если negative = true - передаётся индекс с конца массива, начиная с нуля. */ template static void vectorConst( const ColumnString::Chars_t & data, const ColumnArray::Offsets_t & offsets, const ColumnString::Offsets_t & string_offsets, const ColumnArray::Offset_t index, ColumnString::Chars_t & result_data, ColumnArray::Offsets_t & result_offsets) { size_t size = offsets.size(); result_offsets.resize(size); result_data.reserve(data.size()); ColumnArray::Offset_t current_offset = 0; ColumnArray::Offset_t current_result_offset = 0; for (size_t i = 0; i < size; ++i) { size_t array_size = offsets[i] - current_offset; if (index < array_size) { size_t adjusted_index = !negative ? index : (array_size - index - 1); ColumnArray::Offset_t string_pos = current_offset == 0 && adjusted_index == 0 ? 0 : string_offsets[current_offset + adjusted_index - 1]; ColumnArray::Offset_t string_size = string_offsets[current_offset + adjusted_index] - string_pos; result_data.resize(current_result_offset + string_size); memcpy(&result_data[current_result_offset], &data[string_pos], string_size); current_result_offset += string_size; result_offsets[i] = current_result_offset; } else { /// Вставим пустую строку. result_data.resize(current_result_offset + 1); result_data[current_result_offset] = 0; current_result_offset += 1; result_offsets[i] = current_result_offset; } current_offset = offsets[i]; } } /** Процедура для неконстантного идекса * index_type - тип данных идекса */ template static void vector( const ColumnString::Chars_t & data, const ColumnArray::Offsets_t & offsets, const ColumnString::Offsets_t & string_offsets, const ColumnVector & index, ColumnString::Chars_t & result_data, ColumnArray::Offsets_t & result_offsets) { size_t size = offsets.size(); result_offsets.resize(size); result_data.reserve(data.size()); ColumnArray::Offset_t current_offset = 0; ColumnArray::Offset_t current_result_offset = 0; for (size_t i = 0; i < size; ++i) { size_t array_size = offsets[i] - current_offset; size_t adjusted_index; if (index[i].getType() == Field::Types::UInt64) { UInt64 cur_id = safeGet(index[i]); if (cur_id > 0 && cur_id <= array_size) adjusted_index = cur_id - 1; else adjusted_index = array_size; /// Индекс не вписывается в рамки массива, заменяем заведомо слишком большим } else if (index[i].getType() == Field::Types::Int64) { Int64 cur_id = safeGet(index[i]); if (cur_id > 0 && static_cast(cur_id) <= array_size) adjusted_index = cur_id - 1; else if (cur_id < 0 && static_cast(-cur_id) <= array_size) adjusted_index = array_size + cur_id; else adjusted_index = array_size; /// Индекс не вписывается в рамки массива, заменяем слишком большим } else throw Exception("Illegal type of array index", ErrorCodes::LOGICAL_ERROR); if (adjusted_index < array_size) { ColumnArray::Offset_t string_pos = current_offset == 0 && adjusted_index == 0 ? 0 : string_offsets[current_offset + adjusted_index - 1]; ColumnArray::Offset_t string_size = string_offsets[current_offset + adjusted_index] - string_pos; result_data.resize(current_result_offset + string_size); memcpy(&result_data[current_result_offset], &data[string_pos], string_size); current_result_offset += string_size; result_offsets[i] = current_result_offset; } else { /// Вставим пустую строку. result_data.resize(current_result_offset + 1); result_data[current_result_offset] = 0; current_result_offset += 1; result_offsets[i] = current_result_offset; } current_offset = offsets[i]; } } }; class FunctionArrayElement : public IFunction { public: static constexpr auto name = "arrayElement"; static IFunction * create(const Context & context) { return new FunctionArrayElement; } private: template bool executeNumberConst(Block & block, const ColumnNumbers & arguments, size_t result, const Field & index) { const ColumnArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const ColumnVector * col_nested = typeid_cast *>(&col_array->getData()); if (!col_nested) return false; ColumnVector * col_res = new ColumnVector; block.getByPosition(result).column = col_res; if (index.getType() == Field::Types::UInt64) ArrayElementNumImpl::template vectorConst(col_nested->getData(), col_array->getOffsets(), safeGet(index) - 1, col_res->getData()); else if (index.getType() == Field::Types::Int64) ArrayElementNumImpl::template vectorConst(col_nested->getData(), col_array->getOffsets(), -safeGet(index) - 1, col_res->getData()); else throw Exception("Illegal type of array index", ErrorCodes::LOGICAL_ERROR); return true; } template bool executeNumber(Block & block, const ColumnNumbers & arguments, size_t result, const ColumnVector & index) { const ColumnArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const ColumnVector * col_nested = typeid_cast *>(&col_array->getData()); if (!col_nested) return false; ColumnVector * col_res = new ColumnVector; block.getByPosition(result).column = col_res; ArrayElementNumImpl::template vector(col_nested->getData(), col_array->getOffsets(), index, col_res->getData()); return true; } bool executeStringConst(Block & block, const ColumnNumbers & arguments, size_t result, const Field & index) { const ColumnArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const ColumnString * col_nested = typeid_cast(&col_array->getData()); if (!col_nested) return false; ColumnString * col_res = new ColumnString; block.getByPosition(result).column = col_res; if (index.getType() == Field::Types::UInt64) ArrayElementStringImpl::vectorConst( col_nested->getChars(), col_array->getOffsets(), col_nested->getOffsets(), safeGet(index) - 1, col_res->getChars(), col_res->getOffsets()); else if (index.getType() == Field::Types::Int64) ArrayElementStringImpl::vectorConst( col_nested->getChars(), col_array->getOffsets(), col_nested->getOffsets(), -safeGet(index) - 1, col_res->getChars(), col_res->getOffsets()); else throw Exception("Illegal type of array index", ErrorCodes::LOGICAL_ERROR); return true; } template bool executeString(Block & block, const ColumnNumbers & arguments, size_t result, const ColumnVector & index) { const ColumnArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const ColumnString * col_nested = typeid_cast(&col_array->getData()); if (!col_nested) return false; ColumnString * col_res = new ColumnString; block.getByPosition(result).column = col_res; ArrayElementStringImpl::vector( col_nested->getChars(), col_array->getOffsets(), col_nested->getOffsets(), index, col_res->getChars(), col_res->getOffsets()); return true; } bool executeConstConst(Block & block, const ColumnNumbers & arguments, size_t result, const Field & index) { const ColumnConstArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const DB::Array & array = col_array->getData(); size_t array_size = array.size(); size_t real_index = 0; if (index.getType() == Field::Types::UInt64) real_index = safeGet(index) - 1; else if (index.getType() == Field::Types::Int64) real_index = array_size + safeGet(index); else throw Exception("Illegal type of array index", ErrorCodes::LOGICAL_ERROR); Field value = col_array->getData().at(real_index); block.getByPosition(result).column = block.getByPosition(result).type->createConstColumn( block.rowsInFirstColumn(), value); return true; } template bool executeConst(Block & block, const ColumnNumbers & arguments, size_t result, const ColumnVector & index) { const ColumnConstArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const DB::Array & array = col_array->getData(); size_t array_size = array.size(); block.getByPosition(result).column = block.getByPosition(result).type->createColumn(); for (size_t i = 0; i < col_array->size(); ++i) { if (index[i].getType() == Field::Types::UInt64) { UInt64 cur_id = safeGet(index[i]); if (cur_id > 0 && cur_id <= array_size) block.getByPosition(result).column->insert(array[cur_id - 1]); else block.getByPosition(result).column->insertDefault(); } else if (index[i].getType() == Field::Types::Int64) { Int64 cur_id = safeGet(index[i]); if (cur_id > 0 && static_cast(cur_id) <= array_size) block.getByPosition(result).column->insert(array[cur_id - 1]); else if (cur_id < 0 && static_cast(-cur_id) <= array_size) block.getByPosition(result).column->insert(array[array_size + cur_id]); else block.getByPosition(result).column->insertDefault(); } else throw Exception("Illegal type of array index", ErrorCodes::LOGICAL_ERROR); } return true; } template bool executeArgument(Block & block, const ColumnNumbers & arguments, size_t result) { const ColumnVector * index = typeid_cast *> (&*block.getByPosition(arguments[1]).column); if (!index) return false; if (!( executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeNumber (block, arguments, result, *index) || executeConst (block, arguments, result, *index) || executeString (block, arguments, result, *index))) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); return true; } /** Для массива кортежей функция вычисляется покомпонентно - для каждого элемента кортежа. */ bool executeTuple(Block & block, const ColumnNumbers & arguments, size_t result) { ColumnArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; ColumnTuple * col_nested = typeid_cast(&col_array->getData()); if (!col_nested) return false; Block & tuple_block = col_nested->getData(); size_t tuple_size = tuple_block.columns(); /** Будем вычислять функцию для кортежа внутренностей массива. * Для этого создадим временный блок. * Он будет состоять из следующих столбцов: * - индекс массива, который нужно взять; * - массив из первых элементов кортежей; * - результат взятия элементов по индексу для массива из первых элементов кортежей; * - массив из вторых элементов кортежей; * - результат взятия элементов по индексу для массива из вторых элементов кортежей; * ... */ Block block_of_temporary_results; block_of_temporary_results.insert(block.getByPosition(arguments[1])); /// результаты взятия элементов по индексу для массивов из каждых элементов кортежей; Block result_tuple_block; for (size_t i = 0; i < tuple_size; ++i) { ColumnWithTypeAndName array_of_tuple_section; array_of_tuple_section.column = new ColumnArray(tuple_block.getByPosition(i).column, col_array->getOffsetsColumn()); array_of_tuple_section.type = new DataTypeArray(tuple_block.getByPosition(i).type); block_of_temporary_results.insert(array_of_tuple_section); ColumnWithTypeAndName array_elements_of_tuple_section; block_of_temporary_results.insert(array_elements_of_tuple_section); execute(block_of_temporary_results, ColumnNumbers{i * 2 + 1, 0}, i * 2 + 2); result_tuple_block.insert(block_of_temporary_results.getByPosition(i * 2 + 2)); } ColumnTuple * col_res = new ColumnTuple(result_tuple_block); block.getByPosition(result).column = col_res; return true; } public: /// Получить имя функции. String getName() const override { return name; } /// Получить типы результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() != 2) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 2.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); const DataTypeArray * array_type = typeid_cast(&*arguments[0]); if (!array_type) throw Exception("First argument for function " + getName() + " must be array.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); if (!arguments[1]->isNumeric() || (0 != arguments[1]->getName().compare(0, 4, "UInt") && 0 != arguments[1]->getName().compare(0, 3, "Int"))) throw Exception("Second argument for function " + getName() + " must have UInt or Int type.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return array_type->getNestedType(); } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { if (executeTuple(block, arguments, result)) { } else if (!block.getByPosition(arguments[1]).column->isConst()) { if (!( executeArgument (block, arguments, result) || executeArgument (block, arguments, result) || executeArgument (block, arguments, result) || executeArgument (block, arguments, result) || executeArgument (block, arguments, result) || executeArgument (block, arguments, result) || executeArgument (block, arguments, result) || executeArgument (block, arguments, result))) throw Exception("Second argument for function " + getName() + " must must have UInt or Int type.", ErrorCodes::ILLEGAL_COLUMN); } else { Field index = (*block.getByPosition(arguments[1]).column)[0]; if (index == UInt64(0)) throw Exception("Array indices is 1-based", ErrorCodes::ZERO_ARRAY_OR_TUPLE_INDEX); if (!( executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeNumberConst (block, arguments, result, index) || executeConstConst (block, arguments, result, index) || executeStringConst (block, arguments, result, index))) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); } } }; /// Для has. struct IndexToOne { typedef UInt8 ResultType; static bool apply(size_t j, ResultType & current) { current = 1; return false; } }; /// Для indexOf. struct IndexIdentity { typedef UInt64 ResultType; /// Индекс возвращается начиная с единицы. static bool apply(size_t j, ResultType & current) { current = j + 1; return false; } }; /// Для countEqual. struct IndexCount { typedef UInt32 ResultType; static bool apply(size_t j, ResultType & current) { ++current; return true; } }; template struct ArrayIndexNumImpl { #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wsign-compare" /// compares `lhs` against `i`-th element of `rhs` static bool compare(const T & lhs, const PODArray & rhs, const std::size_t i ) { return lhs == rhs[i]; } /// compares `lhs against `rhs`, third argument unused static bool compare(const T & lhs, const U & rhs, std::size_t) { return lhs == rhs; } #pragma GCC diagnostic pop template static void vector( const PODArray & data, const ColumnArray::Offsets_t & offsets, const ScalarOrVector & value, PODArray & result) { size_t size = offsets.size(); result.resize(size); ColumnArray::Offset_t current_offset = 0; for (size_t i = 0; i < size; ++i) { size_t array_size = offsets[i] - current_offset; typename IndexConv::ResultType current = 0; for (size_t j = 0; j < array_size; ++j) { if (compare(data[current_offset + j], value, i)) { if (!IndexConv::apply(j, current)) break; } } result[i] = current; current_offset = offsets[i]; } } }; template struct ArrayIndexStringImpl { static void vector_const( const ColumnString::Chars_t & data, const ColumnArray::Offsets_t & offsets, const ColumnString::Offsets_t & string_offsets, const String & value, PODArray & result) { const auto size = offsets.size(); const auto value_size = value.size(); result.resize(size); ColumnArray::Offset_t current_offset = 0; for (size_t i = 0; i < size; ++i) { const auto array_size = offsets[i] - current_offset; typename IndexConv::ResultType current = 0; for (size_t j = 0; j < array_size; ++j) { ColumnArray::Offset_t string_pos = current_offset == 0 && j == 0 ? 0 : string_offsets[current_offset + j - 1]; ColumnArray::Offset_t string_size = string_offsets[current_offset + j] - string_pos; if (string_size == value_size + 1 && 0 == memcmp(value.data(), &data[string_pos], value_size)) { if (!IndexConv::apply(j, current)) break; } } result[i] = current; current_offset = offsets[i]; } } static void vector_vector( const ColumnString::Chars_t & data, const ColumnArray::Offsets_t & offsets, const ColumnString::Offsets_t & string_offsets, const ColumnString::Chars_t & item_values, const ColumnString::Offsets_t & item_offsets, PODArray & result) { const auto size = offsets.size(); result.resize(size); ColumnArray::Offset_t current_offset = 0; for (size_t i = 0; i < size; ++i) { const auto array_size = offsets[i] - current_offset; typename IndexConv::ResultType current = 0; const auto value_pos = 0 == i ? 0 : item_offsets[i - 1]; const auto value_size = item_offsets[i] - value_pos; for (size_t j = 0; j < array_size; ++j) { ColumnArray::Offset_t string_pos = current_offset == 0 && j == 0 ? 0 : string_offsets[current_offset + j - 1]; ColumnArray::Offset_t string_size = string_offsets[current_offset + j] - string_pos; if (string_size == value_size && 0 == memcmp(&item_values[value_pos], &data[string_pos], value_size)) { if (!IndexConv::apply(j, current)) break; } } result[i] = current; current_offset = offsets[i]; } } }; template class FunctionArrayIndex : public IFunction { public: static constexpr auto name = Name::name; static IFunction * create(const Context & context) { return new FunctionArrayIndex; } private: typedef ColumnVector ResultColumnType; template bool executeNumber(Block & block, const ColumnNumbers & arguments, size_t result) { return executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result) || executeNumberNumber(block, arguments, result); } template bool executeNumberNumber(Block & block, const ColumnNumbers & arguments, size_t result) { const ColumnArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const ColumnVector * col_nested = typeid_cast *>(&col_array->getData()); if (!col_nested) return false; const auto item_arg = block.getByPosition(arguments[1]).column.get(); if (const auto item_arg_const = typeid_cast *>(item_arg)) { const auto col_res = new ResultColumnType; ColumnPtr col_ptr{col_res}; block.getByPosition(result).column = col_ptr; ArrayIndexNumImpl::vector(col_nested->getData(), col_array->getOffsets(), item_arg_const->getData(), col_res->getData()); } else if (const auto item_arg_vector = typeid_cast *>(item_arg)) { const auto col_res = new ResultColumnType; ColumnPtr col_ptr{col_res}; block.getByPosition(result).column = col_ptr; ArrayIndexNumImpl::vector(col_nested->getData(), col_array->getOffsets(), item_arg_vector->getData(), col_res->getData()); } else return false; return true; } bool executeString(Block & block, const ColumnNumbers & arguments, size_t result) { const ColumnArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const ColumnString * col_nested = typeid_cast(&col_array->getData()); if (!col_nested) return false; const auto item_arg = block.getByPosition(arguments[1]).column.get(); if (const auto item_arg_const = typeid_cast *>(item_arg)) { const auto col_res = new ResultColumnType; ColumnPtr col_ptr{col_res}; block.getByPosition(result).column = col_ptr; ArrayIndexStringImpl::vector_const(col_nested->getChars(), col_array->getOffsets(), col_nested->getOffsets(), item_arg_const->getData(), col_res->getData()); } else if (const auto item_arg_vector = typeid_cast(item_arg)) { const auto col_res = new ResultColumnType; ColumnPtr col_ptr{col_res}; block.getByPosition(result).column = col_ptr; ArrayIndexStringImpl::vector_vector(col_nested->getChars(), col_array->getOffsets(), col_nested->getOffsets(), item_arg_vector->getChars(), item_arg_vector->getOffsets(), col_res->getData()); } return true; } bool executeConst(Block & block, const ColumnNumbers & arguments, size_t result) { const ColumnConstArray * col_array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!col_array) return false; const Array & arr = col_array->getData(); const auto item_arg = block.getByPosition(arguments[1]).column.get(); if (item_arg->isConst()) { typename IndexConv::ResultType current{}; const auto & value = (*item_arg)[0]; for (size_t i = 0, size = arr.size(); i < size; ++i) { if (apply_visitor(FieldVisitorAccurateEquals(), arr[i], value)) { if (!IndexConv::apply(i, current)) break; } } block.getByPosition(result).column = block.getByPosition(result).type->createConstColumn( item_arg->size(), static_cast::Type>(current)); } else { const auto size = item_arg->size(); const auto col_res = new ResultColumnType{size, {}}; ColumnPtr col_ptr{col_res}; block.getByPosition(result).column = col_ptr; auto & data = col_res->getData(); for (size_t row = 0; row < size; ++row) { const auto & value = (*item_arg)[row]; for (size_t i = 0, size = arr.size(); i < size; ++i) { if (apply_visitor(FieldVisitorAccurateEquals(), arr[i], value)) { if (!IndexConv::apply(i, data[row])) break; } } } } return true; } public: /// Получить имя функции. String getName() const override { return name; } /// Получить типы результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() != 2) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 2.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); const DataTypeArray * array_type = typeid_cast(&*arguments[0]); if (!array_type) throw Exception("First argument for function " + getName() + " must be array.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); if (!(array_type->getNestedType()->behavesAsNumber() && arguments[1]->behavesAsNumber()) && array_type->getNestedType()->getName() != arguments[1]->getName()) throw Exception("Type of array elements and second argument for function " + getName() + " must be same." " Passed: " + arguments[0]->getName() + " and " + arguments[1]->getName() + ".", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return new typename DataTypeFromFieldType::Type; } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { if (!(executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeNumber(block, arguments, result) || executeConst(block, arguments, result) || executeString(block, arguments, result))) throw Exception{ "Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN }; } }; class FunctionArrayEnumerate : public IFunction { public: static constexpr auto name = "arrayEnumerate"; static IFunction * create (const Context & context) { return new FunctionArrayEnumerate; } /// Получить имя функции. String getName() const override { return name; } /// Получить типы результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() != 1) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); const DataTypeArray * array_type = typeid_cast(&*arguments[0]); if (!array_type) throw Exception("First argument for function " + getName() + " must be array.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return new DataTypeArray(new DataTypeUInt32); } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { if (const ColumnArray * array = typeid_cast(&*block.getByPosition(arguments[0]).column)) { const ColumnArray::Offsets_t & offsets = array->getOffsets(); ColumnUInt32 * res_nested = new ColumnUInt32; ColumnArray * res_array = new ColumnArray(res_nested, array->getOffsetsColumn()); block.getByPosition(result).column = res_array; ColumnUInt32::Container_t & res_values = res_nested->getData(); res_values.resize(array->getData().size()); size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) { res_values[j] = j - prev_off + 1; } prev_off = off; } } else if (const ColumnConstArray * array = typeid_cast(&*block.getByPosition(arguments[0]).column)) { const Array & values = array->getData(); Array res_values(values.size()); for (size_t i = 0; i < values.size(); ++i) { res_values[i] = i + 1; } ColumnConstArray * res_array = new ColumnConstArray(array->size(), res_values, new DataTypeArray(new DataTypeUInt32)); block.getByPosition(result).column = res_array; } else { throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); } } }; /// Считает количество разных элементов в массиве, или количество разных кортежей из элементов на соответствующих позициях в нескольких массивах. /// NOTE Реализация частично совпадает с arrayEnumerateUniq. class FunctionArrayUniq : public IFunction { public: static constexpr auto name = "arrayUniq"; static IFunction * create(const Context & context) { return new FunctionArrayUniq; } /// Получить имя функции. String getName() const override { return name; } /// Получить типы результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() == 0) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be at least 1.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); for (size_t i = 0; i < arguments.size(); ++i) { const DataTypeArray * array_type = typeid_cast(&*arguments[i]); if (!array_type) throw Exception("All arguments for function " + getName() + " must be arrays; argument " + toString(i + 1) + " isn't.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); } return new DataTypeUInt32; } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { if (arguments.size() == 1 && executeConst(block, arguments, result)) return; Columns array_columns(arguments.size()); const ColumnArray::Offsets_t * offsets = nullptr; ConstColumnPlainPtrs data_columns(arguments.size()); for (size_t i = 0; i < arguments.size(); ++i) { ColumnPtr array_ptr = block.getByPosition(arguments[i]).column; const ColumnArray * array = typeid_cast(&*array_ptr); if (!array) { const ColumnConstArray * const_array = typeid_cast(&*block.getByPosition(arguments[i]).column); if (!const_array) throw Exception("Illegal column " + block.getByPosition(arguments[i]).column->getName() + " of " + toString(i + 1) + "-th argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); array_ptr = const_array->convertToFullColumn(); array = typeid_cast(&*array_ptr); } array_columns[i] = array_ptr; const ColumnArray::Offsets_t & offsets_i = array->getOffsets(); if (!i) offsets = &offsets_i; else if (offsets_i != *offsets) throw Exception("Lengths of all arrays passsed to " + getName() + " must be equal.", ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH); data_columns[i] = &array->getData(); } const ColumnArray * first_array = typeid_cast(&*array_columns[0]); ColumnUInt32 * res = new ColumnUInt32; block.getByPosition(result).column = res; ColumnUInt32::Container_t & res_values = res->getData(); res_values.resize(offsets->size()); if (arguments.size() == 1) { if (!( executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeString (first_array, res_values))) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); } else { if (!execute128bit(*offsets, data_columns, res_values)) executeHashed(*offsets, data_columns, res_values); } } private: /// Изначально выделить кусок памяти для 512 элементов. static constexpr size_t INITIAL_SIZE_DEGREE = 9; template bool executeNumber(const ColumnArray * array, ColumnUInt32::Container_t & res_values) { const ColumnVector * nested = typeid_cast *>(&array->getData()); if (!nested) return false; const ColumnArray::Offsets_t & offsets = array->getOffsets(); const typename ColumnVector::Container_t & values = nested->getData(); typedef ClearableHashSet, HashTableGrower, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(T)> > Set; Set set; size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { set.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) set.insert(values[j]); res_values[i] = set.size(); prev_off = off; } return true; } bool executeString(const ColumnArray * array, ColumnUInt32::Container_t & res_values) { const ColumnString * nested = typeid_cast(&array->getData()); if (!nested) return false; const ColumnArray::Offsets_t & offsets = array->getOffsets(); typedef ClearableHashSet, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(StringRef)> > Set; Set set; size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { set.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) set.insert(nested->getDataAt(j)); res_values[i] = set.size(); prev_off = off; } return true; } bool executeConst(Block & block, const ColumnNumbers & arguments, size_t result) { const ColumnConstArray * array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!array) return false; const Array & values = array->getData(); std::set set; for (size_t i = 0; i < values.size(); ++i) set.insert(values[i]); block.getByPosition(result).column = new ColumnConstUInt32(array->size(), set.size()); return true; } bool execute128bit( const ColumnArray::Offsets_t & offsets, const ConstColumnPlainPtrs & columns, ColumnUInt32::Container_t & res_values) { size_t count = columns.size(); size_t keys_bytes = 0; Sizes key_sizes(count); for (size_t j = 0; j < count; ++j) { if (!columns[j]->isFixed()) return false; key_sizes[j] = columns[j]->sizeOfField(); keys_bytes += key_sizes[j]; } if (keys_bytes > 16) return false; typedef ClearableHashSet, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(UInt128)> > Set; Set set; size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { set.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) set.insert(packFixed(j, count, columns, key_sizes)); res_values[i] = set.size(); prev_off = off; } return true; } void executeHashed( const ColumnArray::Offsets_t & offsets, const ConstColumnPlainPtrs & columns, ColumnUInt32::Container_t & res_values) { size_t count = columns.size(); typedef ClearableHashSet, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(UInt128)> > Set; Set set; size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { set.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) set.insert(hash128(j, count, columns)); res_values[i] = set.size(); prev_off = off; } } }; class FunctionArrayEnumerateUniq : public IFunction { public: static constexpr auto name = "arrayEnumerateUniq"; static IFunction * create(const Context & context) { return new FunctionArrayEnumerateUniq; } /// Получить имя функции. String getName() const override { return name; } /// Получить типы результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() == 0) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be at least 1.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); for (size_t i = 0; i < arguments.size(); ++i) { const DataTypeArray * array_type = typeid_cast(&*arguments[i]); if (!array_type) throw Exception("All arguments for function " + getName() + " must be arrays; argument " + toString(i + 1) + " isn't.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); } return new DataTypeArray(new DataTypeUInt32); } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { if (arguments.size() == 1 && executeConst(block, arguments, result)) return; Columns array_columns(arguments.size()); const ColumnArray::Offsets_t * offsets = nullptr; ConstColumnPlainPtrs data_columns(arguments.size()); for (size_t i = 0; i < arguments.size(); ++i) { ColumnPtr array_ptr = block.getByPosition(arguments[i]).column; const ColumnArray * array = typeid_cast(&*array_ptr); if (!array) { const ColumnConstArray * const_array = typeid_cast(&*block.getByPosition(arguments[i]).column); if (!const_array) throw Exception("Illegal column " + block.getByPosition(arguments[i]).column->getName() + " of " + toString(i + 1) + "-th argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); array_ptr = const_array->convertToFullColumn(); array = typeid_cast(&*array_ptr); } array_columns[i] = array_ptr; const ColumnArray::Offsets_t & offsets_i = array->getOffsets(); if (!i) offsets = &offsets_i; else if (offsets_i != *offsets) throw Exception("Lengths of all arrays passsed to " + getName() + " must be equal.", ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH); data_columns[i] = &array->getData(); } const ColumnArray * first_array = typeid_cast(&*array_columns[0]); ColumnUInt32 * res_nested = new ColumnUInt32; ColumnArray * res_array = new ColumnArray(res_nested, first_array->getOffsetsColumn()); block.getByPosition(result).column = res_array; ColumnUInt32::Container_t & res_values = res_nested->getData(); if (!offsets->empty()) res_values.resize(offsets->back()); if (arguments.size() == 1) { if (!( executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeNumber (first_array, res_values) || executeString (first_array, res_values))) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); } else { if (!execute128bit(*offsets, data_columns, res_values)) executeHashed(*offsets, data_columns, res_values); } } private: /// Изначально выделить кусок памяти для 512 элементов. static constexpr size_t INITIAL_SIZE_DEGREE = 9; template bool executeNumber(const ColumnArray * array, ColumnUInt32::Container_t & res_values) { const ColumnVector * nested = typeid_cast *>(&array->getData()); if (!nested) return false; const ColumnArray::Offsets_t & offsets = array->getOffsets(); const typename ColumnVector::Container_t & values = nested->getData(); typedef ClearableHashMap, HashTableGrower, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(T)> > ValuesToIndices; ValuesToIndices indices; size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { indices.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) { res_values[j] = ++indices[values[j]]; } prev_off = off; } return true; } bool executeString(const ColumnArray * array, ColumnUInt32::Container_t & res_values) { const ColumnString * nested = typeid_cast(&array->getData()); if (!nested) return false; const ColumnArray::Offsets_t & offsets = array->getOffsets(); size_t prev_off = 0; typedef ClearableHashMap, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(StringRef)> > ValuesToIndices; ValuesToIndices indices; for (size_t i = 0; i < offsets.size(); ++i) { indices.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) { res_values[j] = ++indices[nested->getDataAt(j)]; } prev_off = off; } return true; } bool executeConst(Block & block, const ColumnNumbers & arguments, size_t result) { const ColumnConstArray * array = typeid_cast(&*block.getByPosition(arguments[0]).column); if (!array) return false; const Array & values = array->getData(); Array res_values(values.size()); std::map indices; for (size_t i = 0; i < values.size(); ++i) { res_values[i] = static_cast(++indices[values[i]]); } ColumnConstArray * res_array = new ColumnConstArray(array->size(), res_values, new DataTypeArray(new DataTypeUInt32)); block.getByPosition(result).column = res_array; return true; } bool execute128bit( const ColumnArray::Offsets_t & offsets, const ConstColumnPlainPtrs & columns, ColumnUInt32::Container_t & res_values) { size_t count = columns.size(); size_t keys_bytes = 0; Sizes key_sizes(count); for (size_t j = 0; j < count; ++j) { if (!columns[j]->isFixed()) return false; key_sizes[j] = columns[j]->sizeOfField(); keys_bytes += key_sizes[j]; } if (keys_bytes > 16) return false; typedef ClearableHashMap, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(UInt128)> > ValuesToIndices; ValuesToIndices indices; size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { indices.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) { res_values[j] = ++indices[packFixed(j, count, columns, key_sizes)]; } prev_off = off; } return true; } void executeHashed( const ColumnArray::Offsets_t & offsets, const ConstColumnPlainPtrs & columns, ColumnUInt32::Container_t & res_values) { size_t count = columns.size(); typedef ClearableHashMap, HashTableAllocatorWithStackMemory<(1 << INITIAL_SIZE_DEGREE) * sizeof(UInt128)> > ValuesToIndices; ValuesToIndices indices; size_t prev_off = 0; for (size_t i = 0; i < offsets.size(); ++i) { indices.clear(); size_t off = offsets[i]; for (size_t j = prev_off; j < off; ++j) { res_values[j] = ++indices[hash128(j, count, columns)]; } prev_off = off; } } }; template struct TypeToColumnType { using ColumnType = ColumnVector; }; template <> struct TypeToColumnType { using ColumnType = ColumnString; }; template struct DataTypeToName : TypeName { }; template <> struct DataTypeToName { static std::string get() { return "Date"; } }; template <> struct DataTypeToName { static std::string get() { return "DateTime"; } }; template struct FunctionEmptyArray : public IFunction { static constexpr auto base_name = "emptyArray"; static const String name; static IFunction * create(const Context & context) { return new FunctionEmptyArray; } private: String getName() const override { return name; } DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() != 0) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 0.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); return new DataTypeArray{new DataType{}}; } void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { using UnderlyingColumnType = typename TypeToColumnType::ColumnType; block.getByPosition(result).column = new ColumnArray{ new UnderlyingColumnType, new ColumnArray::ColumnOffsets_t{block.rowsInFirstColumn(), 0} }; } }; template const String FunctionEmptyArray::name = FunctionEmptyArray::base_name + DataTypeToName::get(); class FunctionRange : public IFunction { public: static constexpr auto max_elements = 100000000; static constexpr auto name = "range"; static IFunction * create(const Context &) { return new FunctionRange; } private: String getName() const override { return name; } DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() != 1) throw Exception{ "Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH }; const auto arg = arguments.front().get(); if (!typeid_cast(arg) && !typeid_cast(arg) && !typeid_cast(arg) & !typeid_cast(arg)) { throw Exception{ "Illegal type " + arg->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT }; } return new DataTypeArray{arg->clone()}; } template bool execute(Block & block, const IColumn * const arg, const size_t result) { if (const auto in = typeid_cast *>(arg)) { const auto & in_data = in->getData(); const auto total_values = std::accumulate(std::begin(in_data), std::end(in_data), std::size_t{}, [this] (const std::size_t lhs, const std::size_t rhs) { const auto sum = lhs + rhs; if (sum < lhs) throw Exception{ "A call to function " + getName() + " overflows, investigate the values of arguments you are passing", ErrorCodes::ARGUMENT_OUT_OF_BOUND }; return sum; }); if (total_values > max_elements) throw Exception{ "A call to function " + getName() + " would produce " + std::to_string(total_values) + " array elements, which is greater than the allowed maximum of " + std::to_string(max_elements), ErrorCodes::ARGUMENT_OUT_OF_BOUND }; const auto data_col = new ColumnVector{total_values}; const auto out = new ColumnArray{ data_col, new ColumnArray::ColumnOffsets_t{in->size()} }; block.getByPosition(result).column = out; auto & out_data = data_col->getData(); auto & out_offsets = out->getOffsets(); IColumn::Offset_t offset{}; for (const auto i : ext::range(0, in->size())) { std::copy(ext::make_range_iterator(T{}), ext::make_range_iterator(in_data[i]), &out_data[offset]); offset += in_data[i]; out_offsets[i] = offset; } return true; } else if (const auto in = typeid_cast *>(arg)) { const auto & in_data = in->getData(); if (in->size() > std::numeric_limits::max() / in_data) throw Exception{ "A call to function " + getName() + " overflows, investigate the values of arguments you are passing", ErrorCodes::ARGUMENT_OUT_OF_BOUND }; const std::size_t total_values = in->size() * in_data; if (total_values > max_elements) throw Exception{ "A call to function " + getName() + " would produce " + std::to_string(total_values) + " array elements, which is greater than the allowed maximum of " + std::to_string(max_elements), ErrorCodes::ARGUMENT_OUT_OF_BOUND }; const auto data_col = new ColumnVector{total_values}; const auto out = new ColumnArray{ data_col, new ColumnArray::ColumnOffsets_t{in->size()} }; block.getByPosition(result).column = out; auto & out_data = data_col->getData(); auto & out_offsets = out->getOffsets(); IColumn::Offset_t offset{}; for (const auto i : ext::range(0, in->size())) { std::copy(ext::make_range_iterator(T{}), ext::make_range_iterator(in_data), &out_data[offset]); offset += in_data; out_offsets[i] = offset; } return true; } return false; } void execute(Block & block, const ColumnNumbers & arguments, const size_t result) override { const auto col = block.getByPosition(arguments[0]).column.get(); if (!execute(block, col, result) && !execute(block, col, result) && !execute(block, col, result) && !execute(block, col, result)) { throw Exception{ "Illegal column " + col->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN }; } } }; class FunctionEmptyArrayToSingle : public IFunction { public: static constexpr auto name = "emptyArrayToSingle"; static IFunction * create(const Context & context) { return new FunctionEmptyArrayToSingle; } /// Получить имя функции. String getName() const override { return name; } /// Получить типы результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() != 1) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); const DataTypeArray * array_type = typeid_cast(arguments[0].get()); if (!array_type) throw Exception("Argument for function " + getName() + " must be array.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return arguments[0]->clone(); } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { if (executeConst(block, arguments, result)) return; const ColumnArray * array = typeid_cast(block.getByPosition(arguments[0]).column.get()); if (!array) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); ColumnPtr res_ptr = array->cloneEmpty(); block.getByPosition(result).column = res_ptr; ColumnArray & res = static_cast(*res_ptr); const IColumn & src_data = array->getData(); const ColumnArray::Offsets_t & src_offsets = array->getOffsets(); IColumn & res_data = res.getData(); ColumnArray::Offsets_t & res_offsets = res.getOffsets(); if (!( executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeNumber (src_data, src_offsets, res_data, res_offsets) || executeString (src_data, src_offsets, res_data, res_offsets) || executeFixedString (src_data, src_offsets, res_data, res_offsets))) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); } private: bool executeConst(Block & block, const ColumnNumbers & arguments, size_t result) { if (const ColumnConstArray * const_array = typeid_cast(block.getByPosition(arguments[0]).column.get())) { if (const_array->getData().empty()) { auto nested_type = typeid_cast(*block.getByPosition(arguments[0]).type).getNestedType(); block.getByPosition(result).column = new ColumnConstArray( block.rowsInFirstColumn(), {nested_type->getDefault()}, nested_type->clone()); } else block.getByPosition(result).column = block.getByPosition(arguments[0]).column; return true; } else return false; } template bool executeNumber( const IColumn & src_data, const ColumnArray::Offsets_t & src_offsets, IColumn & res_data_col, ColumnArray::Offsets_t & res_offsets) { if (const ColumnVector * src_data_concrete = typeid_cast *>(&src_data)) { const PODArray & src_data = src_data_concrete->getData(); PODArray & res_data = typeid_cast &>(res_data_col).getData(); size_t size = src_offsets.size(); res_offsets.resize(size); res_data.reserve(src_data.size()); ColumnArray::Offset_t src_prev_offset = 0; ColumnArray::Offset_t res_prev_offset = 0; for (size_t i = 0; i < size; ++i) { if (src_offsets[i] != src_prev_offset) { size_t size_to_write = src_offsets[i] - src_prev_offset; size_t prev_res_data_size = res_data.size(); res_data.resize(prev_res_data_size + size_to_write); memcpy(&res_data[prev_res_data_size], &src_data[src_prev_offset], size_to_write * sizeof(T)); res_prev_offset += size_to_write; res_offsets[i] = res_prev_offset; } else { res_data.push_back(T()); ++res_prev_offset; res_offsets[i] = res_prev_offset; } src_prev_offset = src_offsets[i]; } return true; } else return false; } bool executeFixedString( const IColumn & src_data, const ColumnArray::Offsets_t & src_offsets, IColumn & res_data_col, ColumnArray::Offsets_t & res_offsets) { if (const ColumnFixedString * src_data_concrete = typeid_cast(&src_data)) { const size_t n = src_data_concrete->getN(); const ColumnFixedString::Chars_t & src_data = src_data_concrete->getChars(); ColumnFixedString::Chars_t & res_data = typeid_cast(res_data_col).getChars(); size_t size = src_offsets.size(); res_offsets.resize(size); res_data.reserve(src_data.size()); ColumnArray::Offset_t src_prev_offset = 0; ColumnArray::Offset_t res_prev_offset = 0; for (size_t i = 0; i < size; ++i) { if (src_offsets[i] != src_prev_offset) { size_t size_to_write = src_offsets[i] - src_prev_offset; size_t prev_res_data_size = res_data.size(); res_data.resize(prev_res_data_size + size_to_write * n); memcpy(&res_data[prev_res_data_size], &src_data[src_prev_offset], size_to_write * n); res_prev_offset += size_to_write; res_offsets[i] = res_prev_offset; } else { size_t prev_res_data_size = res_data.size(); res_data.resize(prev_res_data_size + n); memset(&res_data[prev_res_data_size], 0, n); ++res_prev_offset; res_offsets[i] = res_prev_offset; } src_prev_offset = src_offsets[i]; } return true; } else return false; } bool executeString( const IColumn & src_data, const ColumnArray::Offsets_t & src_array_offsets, IColumn & res_data_col, ColumnArray::Offsets_t & res_array_offsets) { if (const ColumnString * src_data_concrete = typeid_cast(&src_data)) { const ColumnString::Offsets_t & src_string_offsets = src_data_concrete->getOffsets(); ColumnString::Offsets_t & res_string_offsets = typeid_cast(res_data_col).getOffsets(); const ColumnString::Chars_t & src_data = src_data_concrete->getChars(); ColumnString::Chars_t & res_data = typeid_cast(res_data_col).getChars(); size_t size = src_array_offsets.size(); res_array_offsets.resize(size); res_string_offsets.reserve(src_string_offsets.size()); res_data.reserve(src_data.size()); ColumnArray::Offset_t src_array_prev_offset = 0; ColumnArray::Offset_t res_array_prev_offset = 0; ColumnString::Offset_t src_string_prev_offset = 0; ColumnString::Offset_t res_string_prev_offset = 0; for (size_t i = 0; i < size; ++i) { if (src_array_offsets[i] != src_array_prev_offset) { size_t array_size = src_array_offsets[i] - src_array_prev_offset; size_t bytes_to_copy = 0; size_t from_string_prev_offset_local = src_string_prev_offset; for (size_t j = 0; j < array_size; ++j) { size_t string_size = src_string_offsets[src_array_prev_offset + j] - from_string_prev_offset_local; res_string_prev_offset += string_size; res_string_offsets.push_back(res_string_prev_offset); from_string_prev_offset_local += string_size; bytes_to_copy += string_size; } size_t res_data_old_size = res_data.size(); res_data.resize(res_data_old_size + bytes_to_copy); memcpy(&res_data[res_data_old_size], &src_data[src_string_prev_offset], bytes_to_copy); res_array_prev_offset += array_size; res_array_offsets[i] = res_array_prev_offset; } else { res_data.push_back(0); /// Пустая строка, включая ноль на конце. ++res_string_prev_offset; res_string_offsets.push_back(res_string_prev_offset); ++res_array_prev_offset; res_array_offsets[i] = res_array_prev_offset; } src_array_prev_offset = src_array_offsets[i]; if (src_array_prev_offset) src_string_prev_offset = src_string_offsets[src_array_prev_offset - 1]; } return true; } else return false; } }; class FunctionArrayReverse : public IFunction { public: static constexpr auto name = "reverse"; static IFunction * create(const Context & context) { return new FunctionArrayReverse; } /// Получить имя функции. String getName() const override { return name; } /// Получить типы результата по типам аргументов. Если функция неприменима для данных аргументов - кинуть исключение. DataTypePtr getReturnType(const DataTypes & arguments) const override { if (arguments.size() != 1) throw Exception("Number of arguments for function " + getName() + " doesn't match: passed " + toString(arguments.size()) + ", should be 1.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); const DataTypeArray * array_type = typeid_cast(arguments[0].get()); if (!array_type) throw Exception("Argument for function " + getName() + " must be array.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return arguments[0]->clone(); } /// Выполнить функцию над блоком. void execute(Block & block, const ColumnNumbers & arguments, size_t result) override { if (executeConst(block, arguments, result)) return; const ColumnArray * array = typeid_cast(block.getByPosition(arguments[0]).column.get()); if (!array) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); ColumnPtr res_ptr = array->cloneEmpty(); block.getByPosition(result).column = res_ptr; ColumnArray & res = static_cast(*res_ptr); const IColumn & src_data = array->getData(); const ColumnArray::Offsets_t & offsets = array->getOffsets(); IColumn & res_data = res.getData(); res.getOffsetsColumn() = array->getOffsetsColumn(); if (!( executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeNumber (src_data, offsets, res_data) || executeString (src_data, offsets, res_data) || executeFixedString (src_data, offsets, res_data))) throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), ErrorCodes::ILLEGAL_COLUMN); } private: bool executeConst(Block & block, const ColumnNumbers & arguments, size_t result) { if (const ColumnConstArray * const_array = typeid_cast(block.getByPosition(arguments[0]).column.get())) { const Array & arr = const_array->getData(); size_t size = arr.size(); Array res(size); for (size_t i = 0; i < size; ++i) res[i] = arr[size - i - 1]; block.getByPosition(result).column = new ColumnConstArray( block.rowsInFirstColumn(), res, block.getByPosition(arguments[0]).type->clone()); return true; } else return false; } template bool executeNumber( const IColumn & src_data, const ColumnArray::Offsets_t & src_offsets, IColumn & res_data_col) { if (const ColumnVector * src_data_concrete = typeid_cast *>(&src_data)) { const PODArray & src_data = src_data_concrete->getData(); PODArray & res_data = typeid_cast &>(res_data_col).getData(); size_t size = src_offsets.size(); res_data.resize(src_data.size()); ColumnArray::Offset_t src_prev_offset = 0; for (size_t i = 0; i < size; ++i) { const T * src = &src_data[src_prev_offset]; const T * src_end = &src_data[src_offsets[i]]; if (src == src_end) continue; T * dst = &res_data[src_offsets[i] - 1]; while (src < src_end) { *dst = *src; ++src; --dst; } src_prev_offset = src_offsets[i]; } return true; } else return false; } bool executeFixedString( const IColumn & src_data, const ColumnArray::Offsets_t & src_offsets, IColumn & res_data_col) { if (const ColumnFixedString * src_data_concrete = typeid_cast(&src_data)) { const size_t n = src_data_concrete->getN(); const ColumnFixedString::Chars_t & src_data = src_data_concrete->getChars(); ColumnFixedString::Chars_t & res_data = typeid_cast(res_data_col).getChars(); size_t size = src_offsets.size(); res_data.resize(src_data.size()); ColumnArray::Offset_t src_prev_offset = 0; for (size_t i = 0; i < size; ++i) { const UInt8 * src = &src_data[src_prev_offset * n]; const UInt8 * src_end = &src_data[src_offsets[i] * n]; if (src == src_end) continue; UInt8 * dst = &res_data[src_offsets[i] * n - n]; while (src < src_end) { memcpy(dst, src, n); src += n; dst -= n; } src_prev_offset = src_offsets[i]; } return true; } else return false; } bool executeString( const IColumn & src_data, const ColumnArray::Offsets_t & src_array_offsets, IColumn & res_data_col) { if (const ColumnString * src_data_concrete = typeid_cast(&src_data)) { const ColumnString::Offsets_t & src_string_offsets = src_data_concrete->getOffsets(); ColumnString::Offsets_t & res_string_offsets = typeid_cast(res_data_col).getOffsets(); const ColumnString::Chars_t & src_data = src_data_concrete->getChars(); ColumnString::Chars_t & res_data = typeid_cast(res_data_col).getChars(); size_t size = src_array_offsets.size(); res_string_offsets.resize(src_string_offsets.size()); res_data.resize(src_data.size()); ColumnArray::Offset_t src_array_prev_offset = 0; ColumnString::Offset_t res_string_prev_offset = 0; for (size_t i = 0; i < size; ++i) { if (src_array_offsets[i] != src_array_prev_offset) { size_t array_size = src_array_offsets[i] - src_array_prev_offset; for (size_t j = 0; j < array_size; ++j) { size_t j_reversed = array_size - j - 1; auto src_pos = src_array_prev_offset + j_reversed == 0 ? 0 : src_string_offsets[src_array_prev_offset + j_reversed - 1]; size_t string_size = src_string_offsets[src_array_prev_offset + j_reversed] - src_pos; memcpy(&res_data[res_string_prev_offset], &src_data[src_pos], string_size); res_string_prev_offset += string_size; res_string_offsets[src_array_prev_offset + j] = res_string_prev_offset; } } src_array_prev_offset = src_array_offsets[i]; } return true; } else return false; } }; struct NameHas { static constexpr auto name = "has"; }; struct NameIndexOf { static constexpr auto name = "indexOf"; }; struct NameCountEqual { static constexpr auto name = "countEqual"; }; typedef FunctionArrayIndex FunctionHas; typedef FunctionArrayIndex FunctionIndexOf; typedef FunctionArrayIndex FunctionCountEqual; using FunctionEmptyArrayUInt8 = FunctionEmptyArray; using FunctionEmptyArrayUInt16 = FunctionEmptyArray; using FunctionEmptyArrayUInt32 = FunctionEmptyArray; using FunctionEmptyArrayUInt64 = FunctionEmptyArray; using FunctionEmptyArrayInt8 = FunctionEmptyArray; using FunctionEmptyArrayInt16 = FunctionEmptyArray; using FunctionEmptyArrayInt32 = FunctionEmptyArray; using FunctionEmptyArrayInt64 = FunctionEmptyArray; using FunctionEmptyArrayFloat32 = FunctionEmptyArray; using FunctionEmptyArrayFloat64 = FunctionEmptyArray; using FunctionEmptyArrayDate = FunctionEmptyArray; using FunctionEmptyArrayDateTime = FunctionEmptyArray; using FunctionEmptyArrayString = FunctionEmptyArray; }