mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-22 07:31:57 +00:00
Merge pull request #4207 from andyyzh/bitmap_feature
Added bitmap function feature with roaring bitmap
This commit is contained in:
commit
61e21d50e1
1
contrib/CMakeLists.txt
vendored
1
contrib/CMakeLists.txt
vendored
@ -46,6 +46,7 @@ if (USE_INTERNAL_METROHASH_LIBRARY)
|
||||
endif ()
|
||||
|
||||
add_subdirectory (murmurhash)
|
||||
add_subdirectory (croaring)
|
||||
|
||||
if (USE_INTERNAL_BTRIE_LIBRARY)
|
||||
add_subdirectory (libbtrie)
|
||||
|
6
contrib/croaring/CMakeLists.txt
Normal file
6
contrib/croaring/CMakeLists.txt
Normal file
@ -0,0 +1,6 @@
|
||||
add_library(roaring
|
||||
roaring.c
|
||||
roaring.h
|
||||
roaring.hh)
|
||||
|
||||
target_include_directories (roaring PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
|
202
contrib/croaring/LICENSE
Normal file
202
contrib/croaring/LICENSE
Normal file
@ -0,0 +1,202 @@
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "{}"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright 2016 The CRoaring authors
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
|
2
contrib/croaring/README.txt
Normal file
2
contrib/croaring/README.txt
Normal file
@ -0,0 +1,2 @@
|
||||
download from https://github.com/RoaringBitmap/CRoaring/archive/v0.2.57.tar.gz
|
||||
and use ./amalgamation.sh generate
|
11093
contrib/croaring/roaring.c
Normal file
11093
contrib/croaring/roaring.c
Normal file
File diff suppressed because it is too large
Load Diff
7187
contrib/croaring/roaring.h
Normal file
7187
contrib/croaring/roaring.h
Normal file
File diff suppressed because it is too large
Load Diff
1732
contrib/croaring/roaring.hh
Normal file
1732
contrib/croaring/roaring.hh
Normal file
File diff suppressed because it is too large
Load Diff
@ -199,6 +199,8 @@ target_link_libraries (clickhouse_common_io
|
||||
Threads::Threads
|
||||
PRIVATE
|
||||
${CMAKE_DL_LIBS}
|
||||
PUBLIC
|
||||
roaring
|
||||
)
|
||||
|
||||
target_include_directories(clickhouse_common_io SYSTEM BEFORE PUBLIC ${PDQSORT_INCLUDE_DIR})
|
||||
|
40
dbms/src/AggregateFunctions/AggregateFunctionGroupBitmap.cpp
Normal file
40
dbms/src/AggregateFunctions/AggregateFunctionGroupBitmap.cpp
Normal file
@ -0,0 +1,40 @@
|
||||
#include <AggregateFunctions/AggregateFunctionFactory.h>
|
||||
#include <AggregateFunctions/AggregateFunctionGroupBitmap.h>
|
||||
#include <AggregateFunctions/Helpers.h>
|
||||
#include <AggregateFunctions/FactoryHelpers.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
template <template <typename> class Data>
|
||||
AggregateFunctionPtr createAggregateFunctionBitmap(const std::string & name, const DataTypes & argument_types, const Array & parameters)
|
||||
{
|
||||
assertNoParameters(name, parameters);
|
||||
assertUnary(name, argument_types);
|
||||
|
||||
if (!argument_types[0]->canBeUsedInBitOperations())
|
||||
throw Exception("The type " + argument_types[0]->getName() + " of argument for aggregate function " + name
|
||||
+ " is illegal, because it cannot be used in Bitmap operations",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
AggregateFunctionPtr res(createWithUnsignedIntegerType<AggregateFunctionBitmap, Data>(*argument_types[0], argument_types[0]));
|
||||
|
||||
if (!res)
|
||||
throw Exception("Illegal type " + argument_types[0]->getName() + " of argument for aggregate function " + name, ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
void registerAggregateFunctionsBitmap(AggregateFunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction("groupBitmap", createAggregateFunctionBitmap<AggregateFunctionGroupBitmapData>);
|
||||
|
||||
}
|
||||
|
||||
}
|
56
dbms/src/AggregateFunctions/AggregateFunctionGroupBitmap.h
Normal file
56
dbms/src/AggregateFunctions/AggregateFunctionGroupBitmap.h
Normal file
@ -0,0 +1,56 @@
|
||||
#pragma once
|
||||
|
||||
#include <Columns/ColumnVector.h>
|
||||
#include <boost/noncopyable.hpp>
|
||||
#include <AggregateFunctions/IAggregateFunction.h>
|
||||
#include <AggregateFunctions/AggregateFunctionGroupBitmapData.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
/// Counts bitmap operation on numbers.
|
||||
template <typename T, typename Data>
|
||||
class AggregateFunctionBitmap final : public IAggregateFunctionDataHelper<Data, AggregateFunctionBitmap<T, Data>>
|
||||
{
|
||||
public:
|
||||
AggregateFunctionBitmap(const DataTypePtr & type)
|
||||
: IAggregateFunctionDataHelper<Data, AggregateFunctionBitmap<T, Data>>({type}, {}) {}
|
||||
|
||||
String getName() const override { return Data::name(); }
|
||||
|
||||
DataTypePtr getReturnType() const override
|
||||
{
|
||||
return std::make_shared<DataTypeNumber<T>>();
|
||||
}
|
||||
|
||||
void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
|
||||
{
|
||||
this->data(place).rbs.add(static_cast<const ColumnVector<T> &>(*columns[0]).getData()[row_num]);
|
||||
}
|
||||
|
||||
void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override
|
||||
{
|
||||
this->data(place).rbs.merge(this->data(rhs).rbs);
|
||||
}
|
||||
|
||||
void serialize(ConstAggregateDataPtr place, WriteBuffer & buf) const override
|
||||
{
|
||||
this->data(place).rbs.write(buf);
|
||||
}
|
||||
|
||||
void deserialize(AggregateDataPtr place, ReadBuffer & buf, Arena *) const override
|
||||
{
|
||||
this->data(place).rbs.read(buf);
|
||||
}
|
||||
|
||||
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
|
||||
{
|
||||
static_cast<ColumnVector<T> &>(to).getData().push_back(this->data(place).rbs.size());
|
||||
}
|
||||
|
||||
const char * getHeaderFilePath() const override { return __FILE__; }
|
||||
};
|
||||
|
||||
|
||||
}
|
516
dbms/src/AggregateFunctions/AggregateFunctionGroupBitmapData.h
Normal file
516
dbms/src/AggregateFunctions/AggregateFunctionGroupBitmapData.h
Normal file
@ -0,0 +1,516 @@
|
||||
#pragma once
|
||||
|
||||
#include <roaring.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
#include <boost/noncopyable.hpp>
|
||||
#include <roaring.hh>
|
||||
#include <Common/HashTable/SmallTable.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
/**
|
||||
* For a small number of values - an array of fixed size "on the stack".
|
||||
* For large, roaring_bitmap_t is allocated.
|
||||
* For a description of the roaring_bitmap_t, see: https://github.com/RoaringBitmap/CRoaring
|
||||
*/
|
||||
template <typename T, UInt8 small_set_size>
|
||||
class RoaringBitmapWithSmallSet : private boost::noncopyable
|
||||
{
|
||||
private:
|
||||
using Small = SmallSet<T, small_set_size>;
|
||||
using ValueBuffer = std::vector<T>;
|
||||
Small small;
|
||||
roaring_bitmap_t * rb = nullptr;
|
||||
|
||||
void toLarge()
|
||||
{
|
||||
rb = roaring_bitmap_create();
|
||||
|
||||
for (const auto & x : small)
|
||||
roaring_bitmap_add(rb, x);
|
||||
}
|
||||
|
||||
public:
|
||||
bool isLarge() const { return rb != nullptr; }
|
||||
|
||||
bool isSmall() const { return rb == nullptr; }
|
||||
|
||||
~RoaringBitmapWithSmallSet()
|
||||
{
|
||||
if (isLarge())
|
||||
roaring_bitmap_free(rb);
|
||||
}
|
||||
|
||||
void add(T value)
|
||||
{
|
||||
if (isSmall())
|
||||
{
|
||||
if (small.find(value) == small.end())
|
||||
{
|
||||
if (!small.full())
|
||||
small.insert(value);
|
||||
else
|
||||
{
|
||||
toLarge();
|
||||
roaring_bitmap_add(rb, value);
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
roaring_bitmap_add(rb, value);
|
||||
}
|
||||
|
||||
UInt64 size() const { return isSmall() ? small.size() : roaring_bitmap_get_cardinality(rb); }
|
||||
|
||||
void merge(const RoaringBitmapWithSmallSet & r1)
|
||||
{
|
||||
if (r1.isLarge())
|
||||
{
|
||||
if (isSmall())
|
||||
toLarge();
|
||||
|
||||
roaring_bitmap_or_inplace(rb, r1.rb);
|
||||
}
|
||||
else
|
||||
{
|
||||
for (const auto & x : r1.small)
|
||||
add(x);
|
||||
}
|
||||
}
|
||||
|
||||
void read(DB::ReadBuffer & in)
|
||||
{
|
||||
bool is_large;
|
||||
readBinary(is_large, in);
|
||||
|
||||
if (is_large)
|
||||
{
|
||||
toLarge();
|
||||
UInt32 cardinality;
|
||||
readBinary(cardinality, in);
|
||||
db_roaring_bitmap_add_many(in, rb, cardinality);
|
||||
}
|
||||
else
|
||||
small.read(in);
|
||||
}
|
||||
|
||||
void write(DB::WriteBuffer & out) const
|
||||
{
|
||||
writeBinary(isLarge(), out);
|
||||
|
||||
if (isLarge())
|
||||
{
|
||||
UInt32 cardinality = roaring_bitmap_get_cardinality(rb);
|
||||
writePODBinary(cardinality, out);
|
||||
db_ra_to_uint32_array(out, &rb->high_low_container);
|
||||
}
|
||||
else
|
||||
small.write(out);
|
||||
}
|
||||
|
||||
|
||||
roaring_bitmap_t * getRb() const { return rb; }
|
||||
|
||||
Small & getSmall() const { return small; }
|
||||
|
||||
/**
|
||||
* Get a new roaring_bitmap_t from elements of small
|
||||
*/
|
||||
roaring_bitmap_t * getNewRbFromSmall() const
|
||||
{
|
||||
roaring_bitmap_t * smallRb = roaring_bitmap_create();
|
||||
for (const auto & x : small)
|
||||
roaring_bitmap_add(smallRb, x);
|
||||
return smallRb;
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes the intersection between two bitmaps
|
||||
*/
|
||||
void rb_and(const RoaringBitmapWithSmallSet & r1)
|
||||
{
|
||||
ValueBuffer buffer;
|
||||
if (isSmall() && r1.isSmall())
|
||||
{
|
||||
// intersect
|
||||
for (const auto & value : this->small)
|
||||
if (r1.small.find(value) != r1.small.end())
|
||||
buffer.push_back(value);
|
||||
|
||||
// Clear out the original values
|
||||
this->small.clear();
|
||||
|
||||
for (const auto & value : buffer)
|
||||
this->small.insert(value);
|
||||
|
||||
buffer.clear();
|
||||
}
|
||||
else if (isSmall() && r1.isLarge())
|
||||
{
|
||||
for (const auto & value : this->small)
|
||||
if (roaring_bitmap_contains(r1.rb, value))
|
||||
buffer.push_back(value);
|
||||
|
||||
// Clear out the original values
|
||||
this->small.clear();
|
||||
|
||||
for (const auto & value : buffer)
|
||||
this->small.insert(value);
|
||||
|
||||
buffer.clear();
|
||||
}
|
||||
else
|
||||
{
|
||||
roaring_bitmap_t * rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
|
||||
roaring_bitmap_and_inplace(rb, rb1);
|
||||
if (r1.isSmall())
|
||||
roaring_bitmap_free(rb1);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes the union between two bitmaps.
|
||||
*/
|
||||
void rb_or(const RoaringBitmapWithSmallSet & r1) { this->merge(r1); }
|
||||
|
||||
/**
|
||||
* Computes the symmetric difference (xor) between two bitmaps.
|
||||
*/
|
||||
void rb_xor(const RoaringBitmapWithSmallSet & r1)
|
||||
{
|
||||
if (this->isSmall())
|
||||
toLarge();
|
||||
roaring_bitmap_t * rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
|
||||
roaring_bitmap_xor_inplace(rb, rb1);
|
||||
if (r1.isSmall())
|
||||
roaring_bitmap_free(rb1);
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes the difference (andnot) between two bitmaps
|
||||
*/
|
||||
void rb_andnot(const RoaringBitmapWithSmallSet & r1)
|
||||
{
|
||||
ValueBuffer buffer;
|
||||
if (isSmall() && r1.isSmall())
|
||||
{
|
||||
// subtract
|
||||
for (const auto & value : this->small)
|
||||
if (r1.small.find(value) == r1.small.end())
|
||||
buffer.push_back(value);
|
||||
|
||||
// Clear out the original values
|
||||
this->small.clear();
|
||||
|
||||
for (const auto & value : buffer)
|
||||
this->small.insert(value);
|
||||
|
||||
buffer.clear();
|
||||
}
|
||||
else if (isSmall() && r1.isLarge())
|
||||
{
|
||||
for (const auto & value : this->small)
|
||||
if (!roaring_bitmap_contains(r1.rb, value))
|
||||
buffer.push_back(value);
|
||||
|
||||
// Clear out the original values
|
||||
this->small.clear();
|
||||
|
||||
for (const auto & value : buffer)
|
||||
this->small.insert(value);
|
||||
|
||||
buffer.clear();
|
||||
}
|
||||
else
|
||||
{
|
||||
roaring_bitmap_t * rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
|
||||
roaring_bitmap_andnot_inplace(rb, rb1);
|
||||
if (r1.isSmall())
|
||||
roaring_bitmap_free(rb1);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes the cardinality of the intersection between two bitmaps.
|
||||
*/
|
||||
UInt64 rb_and_cardinality(const RoaringBitmapWithSmallSet & r1) const
|
||||
{
|
||||
UInt64 retSize = 0;
|
||||
if (isSmall() && r1.isSmall())
|
||||
{
|
||||
for (const auto & value : this->small)
|
||||
if (r1.small.find(value) != r1.small.end())
|
||||
retSize++;
|
||||
}
|
||||
else if (isSmall() && r1.isLarge())
|
||||
{
|
||||
for (const auto & value : this->small)
|
||||
if (roaring_bitmap_contains(r1.rb, value))
|
||||
retSize++;
|
||||
}
|
||||
else
|
||||
{
|
||||
roaring_bitmap_t * rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
|
||||
retSize = roaring_bitmap_and_cardinality(rb, rb1);
|
||||
if (r1.isSmall())
|
||||
roaring_bitmap_free(rb1);
|
||||
}
|
||||
return retSize;
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes the cardinality of the union between two bitmaps.
|
||||
*/
|
||||
UInt64 rb_or_cardinality(const RoaringBitmapWithSmallSet & r1) const
|
||||
{
|
||||
UInt64 c1 = this->size();
|
||||
UInt64 c2 = r1.size();
|
||||
UInt64 inter = this->rb_and_cardinality(r1);
|
||||
return c1 + c2 - inter;
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes the cardinality of the symmetric difference (andnot) between two bitmaps.
|
||||
*/
|
||||
UInt64 rb_xor_cardinality(const RoaringBitmapWithSmallSet & r1) const
|
||||
{
|
||||
UInt64 c1 = this->size();
|
||||
UInt64 c2 = r1.size();
|
||||
UInt64 inter = this->rb_and_cardinality(r1);
|
||||
return c1 + c2 - 2 * inter;
|
||||
}
|
||||
|
||||
/**
|
||||
* Computes the cardinality of the difference (andnot) between two bitmaps.
|
||||
*/
|
||||
UInt64 rb_andnot_cardinality(const RoaringBitmapWithSmallSet & r1) const
|
||||
{
|
||||
UInt64 c1 = this->size();
|
||||
UInt64 inter = this->rb_and_cardinality(r1);
|
||||
return c1 - inter;
|
||||
}
|
||||
|
||||
/**
|
||||
* Return 1 if the two bitmaps contain the same elements.
|
||||
*/
|
||||
UInt8 rb_equals(const RoaringBitmapWithSmallSet & r1)
|
||||
{
|
||||
if (this->isSmall())
|
||||
toLarge();
|
||||
roaring_bitmap_t * rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
|
||||
UInt8 is_true = roaring_bitmap_equals(rb, rb1);
|
||||
if (r1.isSmall())
|
||||
roaring_bitmap_free(rb1);
|
||||
return is_true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check whether two bitmaps intersect.
|
||||
*/
|
||||
UInt8 rb_intersect(const RoaringBitmapWithSmallSet & r1)
|
||||
{
|
||||
if (this->isSmall())
|
||||
toLarge();
|
||||
roaring_bitmap_t * rb1 = r1.isSmall() ? r1.getNewRbFromSmall() : r1.getRb();
|
||||
UInt8 is_true = roaring_bitmap_intersect(rb, rb1);
|
||||
if (r1.isSmall())
|
||||
roaring_bitmap_free(rb1);
|
||||
return is_true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Remove value
|
||||
*/
|
||||
void rb_remove(UInt64 offsetid)
|
||||
{
|
||||
if (this->isSmall())
|
||||
toLarge();
|
||||
roaring_bitmap_remove(rb, offsetid);
|
||||
}
|
||||
|
||||
/**
|
||||
* compute (in place) the negation of the roaring bitmap within a specified
|
||||
* interval: [range_start, range_end). The number of negated values is
|
||||
* range_end - range_start.
|
||||
* Areas outside the range are passed through unchanged.
|
||||
*/
|
||||
void rb_flip(UInt64 offsetstart, UInt64 offsetend)
|
||||
{
|
||||
if (this->isSmall())
|
||||
toLarge();
|
||||
roaring_bitmap_flip_inplace(rb, offsetstart, offsetend);
|
||||
}
|
||||
|
||||
/**
|
||||
* returns the number of integers that are smaller or equal to offsetid.
|
||||
*/
|
||||
UInt64 rb_rank(UInt64 offsetid)
|
||||
{
|
||||
if (this->isSmall())
|
||||
toLarge();
|
||||
return roaring_bitmap_rank(rb, offsetid);
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert elements to integer array, return number of elements
|
||||
*/
|
||||
template <typename Element>
|
||||
UInt64 rb_to_array(PaddedPODArray<Element> & res_data) const
|
||||
{
|
||||
UInt64 count = 0;
|
||||
if (this->isSmall())
|
||||
{
|
||||
for (const auto & x : small)
|
||||
{
|
||||
res_data.emplace_back(x);
|
||||
count++;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
roaring_uint32_iterator_t iterator;
|
||||
roaring_init_iterator(rb, &iterator);
|
||||
while (iterator.has_value)
|
||||
{
|
||||
res_data.emplace_back(iterator.current_value);
|
||||
roaring_advance_uint32_iterator(&iterator);
|
||||
count++;
|
||||
}
|
||||
}
|
||||
return count;
|
||||
}
|
||||
|
||||
private:
|
||||
/// To read and write the DB Buffer directly, migrate code from CRoaring
|
||||
void db_roaring_bitmap_add_many(DB::ReadBuffer & dbBuf, roaring_bitmap_t * r, size_t n_args)
|
||||
{
|
||||
void * container = NULL; // hold value of last container touched
|
||||
uint8_t typecode = 0; // typecode of last container touched
|
||||
uint32_t prev = 0; // previous valued inserted
|
||||
size_t i = 0; // index of value
|
||||
int containerindex = 0;
|
||||
if (n_args == 0)
|
||||
return;
|
||||
uint32_t val;
|
||||
readBinary(val, dbBuf);
|
||||
container = containerptr_roaring_bitmap_add(r, val, &typecode, &containerindex);
|
||||
prev = val;
|
||||
i++;
|
||||
for (; i < n_args; i++)
|
||||
{
|
||||
readBinary(val, dbBuf);
|
||||
if (((prev ^ val) >> 16) == 0)
|
||||
{ // no need to seek the container, it is at hand
|
||||
// because we already have the container at hand, we can do the
|
||||
// insertion
|
||||
// automatically, bypassing the roaring_bitmap_add call
|
||||
uint8_t newtypecode = typecode;
|
||||
void * container2 = container_add(container, val & 0xFFFF, typecode, &newtypecode);
|
||||
// rare instance when we need to
|
||||
if (container2 != container)
|
||||
{
|
||||
// change the container type
|
||||
container_free(container, typecode);
|
||||
ra_set_container_at_index(&r->high_low_container, containerindex, container2, newtypecode);
|
||||
typecode = newtypecode;
|
||||
container = container2;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
container = containerptr_roaring_bitmap_add(r, val, &typecode, &containerindex);
|
||||
}
|
||||
prev = val;
|
||||
}
|
||||
}
|
||||
|
||||
void db_ra_to_uint32_array(DB::WriteBuffer & dbBuf, roaring_array_t * ra) const
|
||||
{
|
||||
size_t ctr = 0;
|
||||
for (Int32 i = 0; i < ra->size; ++i)
|
||||
{
|
||||
Int32 num_added = db_container_to_uint32_array(dbBuf, ra->containers[i], ra->typecodes[i], ((UInt32)ra->keys[i]) << 16);
|
||||
ctr += num_added;
|
||||
}
|
||||
}
|
||||
|
||||
UInt32 db_container_to_uint32_array(DB::WriteBuffer & dbBuf, const void * container, UInt8 typecode, UInt32 base) const
|
||||
{
|
||||
container = container_unwrap_shared(container, &typecode);
|
||||
switch (typecode)
|
||||
{
|
||||
case BITSET_CONTAINER_TYPE_CODE:
|
||||
return db_bitset_container_to_uint32_array(dbBuf, (const bitset_container_t *)container, base);
|
||||
case ARRAY_CONTAINER_TYPE_CODE:
|
||||
return db_array_container_to_uint32_array(dbBuf, (const array_container_t *)container, base);
|
||||
case RUN_CONTAINER_TYPE_CODE:
|
||||
return db_run_container_to_uint32_array(dbBuf, (const run_container_t *)container, base);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
UInt32 db_bitset_container_to_uint32_array(DB::WriteBuffer & dbBuf, const bitset_container_t * cont, UInt32 base) const
|
||||
{
|
||||
return (UInt32)db_bitset_extract_setbits(dbBuf, cont->array, BITSET_CONTAINER_SIZE_IN_WORDS, base);
|
||||
}
|
||||
|
||||
size_t db_bitset_extract_setbits(DB::WriteBuffer & dbBuf, UInt64 * bitset, size_t length, UInt32 base) const
|
||||
{
|
||||
UInt32 outpos = 0;
|
||||
for (size_t i = 0; i < length; ++i)
|
||||
{
|
||||
UInt64 w = bitset[i];
|
||||
while (w != 0)
|
||||
{
|
||||
UInt64 t = w & (~w + 1); // on x64, should compile to BLSI (careful: the Intel compiler seems to fail)
|
||||
UInt32 r = __builtin_ctzll(w); // on x64, should compile to TZCNT
|
||||
UInt32 val = r + base;
|
||||
writePODBinary(val, dbBuf);
|
||||
outpos++;
|
||||
w ^= t;
|
||||
}
|
||||
base += 64;
|
||||
}
|
||||
return outpos;
|
||||
}
|
||||
|
||||
int db_array_container_to_uint32_array(DB::WriteBuffer & dbBuf, const array_container_t * cont, UInt32 base) const
|
||||
{
|
||||
UInt32 outpos = 0;
|
||||
for (Int32 i = 0; i < cont->cardinality; ++i)
|
||||
{
|
||||
const UInt32 val = base + cont->array[i];
|
||||
writePODBinary(val, dbBuf);
|
||||
outpos++;
|
||||
}
|
||||
return outpos;
|
||||
}
|
||||
|
||||
int db_run_container_to_uint32_array(DB::WriteBuffer & dbBuf, const run_container_t * cont, UInt32 base) const
|
||||
{
|
||||
UInt32 outpos = 0;
|
||||
for (Int32 i = 0; i < cont->n_runs; ++i)
|
||||
{
|
||||
UInt32 run_start = base + cont->runs[i].value;
|
||||
UInt16 le = cont->runs[i].length;
|
||||
for (Int32 j = 0; j <= le; ++j)
|
||||
{
|
||||
UInt32 val = run_start + j;
|
||||
writePODBinary(val, dbBuf);
|
||||
outpos++;
|
||||
}
|
||||
}
|
||||
return outpos;
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct AggregateFunctionGroupBitmapData
|
||||
{
|
||||
RoaringBitmapWithSmallSet<T, 32> rbs;
|
||||
static const char * name() { return "groupBitmap"; }
|
||||
};
|
||||
|
||||
|
||||
}
|
@ -26,6 +26,7 @@ void registerAggregateFunctionUniqCombined(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionUniqUpTo(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionTopK(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionsBitwise(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionsBitmap(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionsMaxIntersections(AggregateFunctionFactory &);
|
||||
void registerAggregateFunctionEntropy(AggregateFunctionFactory &);
|
||||
|
||||
@ -63,6 +64,7 @@ void registerAggregateFunctions()
|
||||
registerAggregateFunctionUniqUpTo(factory);
|
||||
registerAggregateFunctionTopK(factory);
|
||||
registerAggregateFunctionsBitwise(factory);
|
||||
registerAggregateFunctionsBitmap(factory);
|
||||
registerAggregateFunctionsMaxIntersections(factory);
|
||||
registerAggregateFunctionHistogram(factory);
|
||||
registerAggregateFunctionRetention(factory);
|
||||
|
25
dbms/src/Functions/FunctionsBitmap.cpp
Normal file
25
dbms/src/Functions/FunctionsBitmap.cpp
Normal file
@ -0,0 +1,25 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/FunctionsBitmap.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
void registerFunctionsBitmap(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionBitmapBuild>();
|
||||
factory.registerFunction<FunctionBitmapToArray>();
|
||||
|
||||
factory.registerFunction<FunctionBitmapSelfCardinality>();
|
||||
factory.registerFunction<FunctionBitmapAndCardinality>();
|
||||
factory.registerFunction<FunctionBitmapOrCardinality>();
|
||||
factory.registerFunction<FunctionBitmapXorCardinality>();
|
||||
factory.registerFunction<FunctionBitmapAndnotCardinality>();
|
||||
|
||||
factory.registerFunction<FunctionBitmapAnd>();
|
||||
factory.registerFunction<FunctionBitmapOr>();
|
||||
factory.registerFunction<FunctionBitmapXor>();
|
||||
factory.registerFunction<FunctionBitmapAndnot>();
|
||||
|
||||
}
|
||||
}
|
593
dbms/src/Functions/FunctionsBitmap.h
Normal file
593
dbms/src/Functions/FunctionsBitmap.h
Normal file
@ -0,0 +1,593 @@
|
||||
#pragma once
|
||||
|
||||
#include <AggregateFunctions/AggregateFunctionFactory.h>
|
||||
#include <AggregateFunctions/AggregateFunctionGroupBitmapData.h>
|
||||
#include <Columns/ColumnAggregateFunction.h>
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <Columns/ColumnFunction.h>
|
||||
#include <Columns/ColumnVector.h>
|
||||
#include <DataTypes/DataTypeAggregateFunction.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <Functions/FunctionHelpers.h>
|
||||
#include <Functions/IFunction.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
|
||||
}
|
||||
|
||||
/** Bitmap functions.
|
||||
* Build a bitmap from integer array:
|
||||
* bitmapBuild: integer[] -> bitmap
|
||||
*
|
||||
* Convert bitmap to integer array:
|
||||
* bitmapToArray: bitmap -> integer[]
|
||||
*
|
||||
* Two bitmap and calculation:
|
||||
* bitmapAnd: bitmap,bitmap -> bitmap
|
||||
*
|
||||
* Two bitmap or calculation:
|
||||
* bitmapOr: bitmap,bitmap -> bitmap
|
||||
*
|
||||
* Two bitmap xor calculation:
|
||||
* bitmapXor: bitmap,bitmap -> bitmap
|
||||
*
|
||||
* Two bitmap andnot calculation:
|
||||
* bitmapAndnot: bitmap,bitmap -> bitmap
|
||||
*
|
||||
* Retrun bitmap cardinality:
|
||||
* bitmapCardinality: bitmap -> integer
|
||||
*
|
||||
* Two bitmap and calculation, return cardinality:
|
||||
* bitmapAndCardinality: bitmap,bitmap -> integer
|
||||
*
|
||||
* Two bitmap or calculation, return cardinality:
|
||||
* bitmapOrCardinality: bitmap,bitmap -> integer
|
||||
*
|
||||
* Two bitmap xor calculation, return cardinality:
|
||||
* bitmapXorCardinality: bitmap,bitmap -> integer
|
||||
*
|
||||
* Two bitmap andnot calculation, return cardinality:
|
||||
* bitmapAndnotCardinality: bitmap,bitmap -> integer
|
||||
*/
|
||||
|
||||
template <typename Name>
|
||||
class FunctionBitmapBuildImpl : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
|
||||
static FunctionPtr create(const Context &) { return std::make_shared<FunctionBitmapBuildImpl>(); }
|
||||
|
||||
String getName() const override { return name; }
|
||||
|
||||
bool isVariadic() const override { return false; }
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
if (arguments[0]->onlyNull())
|
||||
return arguments[0];
|
||||
|
||||
auto array_type = typeid_cast<const DataTypeArray *>(arguments[0].get());
|
||||
if (!array_type)
|
||||
throw Exception(
|
||||
"First argument for function " + getName() + " must be an array but it has type " + arguments[0]->getName() + ".",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
auto nested_type = array_type->getNestedType();
|
||||
DataTypes argument_types = {nested_type};
|
||||
Array params_row;
|
||||
AggregateFunctionPtr bitmap_function
|
||||
= AggregateFunctionFactory::instance().get(AggregateFunctionGroupBitmapData<UInt32>::name(), argument_types, params_row);
|
||||
|
||||
return std::make_shared<DataTypeAggregateFunction>(bitmap_function, argument_types, params_row);
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t /* input_rows_count */) override
|
||||
{
|
||||
const IDataType * from_type = block.getByPosition(arguments[0]).type.get();
|
||||
auto array_type = typeid_cast<const DataTypeArray *>(from_type);
|
||||
auto nested_type = array_type->getNestedType();
|
||||
|
||||
DataTypes argument_types = {nested_type};
|
||||
|
||||
WhichDataType which(nested_type);
|
||||
if (which.isUInt8())
|
||||
executeBitmapData<UInt8>(block, argument_types, arguments, result);
|
||||
else if (which.isUInt16())
|
||||
executeBitmapData<UInt16>(block, argument_types, arguments, result);
|
||||
else if (which.isUInt32())
|
||||
executeBitmapData<UInt32>(block, argument_types, arguments, result);
|
||||
else if (which.isUInt64())
|
||||
executeBitmapData<UInt64>(block, argument_types, arguments, result);
|
||||
else
|
||||
throw Exception(
|
||||
"Unexpected type " + from_type->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
}
|
||||
|
||||
private:
|
||||
template <typename T>
|
||||
void executeBitmapData(Block & block, DataTypes & argument_types, const ColumnNumbers & arguments, size_t result)
|
||||
{
|
||||
// input data
|
||||
const ColumnArray * array = typeid_cast<const ColumnArray *>(block.getByPosition(arguments[0]).column.get());
|
||||
ColumnPtr mapped = array->getDataPtr();
|
||||
const ColumnArray::Offsets & offsets = array->getOffsets();
|
||||
const ColumnVector<T> * column = checkAndGetColumn<ColumnVector<T>>(&*mapped);
|
||||
const typename ColumnVector<T>::Container & input_data = column->getData();
|
||||
|
||||
// output data
|
||||
Array params_row;
|
||||
AggregateFunctionPtr bitmap_function
|
||||
= AggregateFunctionFactory::instance().get(AggregateFunctionGroupBitmapData<UInt32>::name(), argument_types, params_row);
|
||||
auto col_to = ColumnAggregateFunction::create(bitmap_function);
|
||||
col_to->reserve(offsets.size());
|
||||
|
||||
size_t pos = 0;
|
||||
for (size_t i = 0; i < offsets.size(); ++i)
|
||||
{
|
||||
col_to->insertDefault();
|
||||
AggregateFunctionGroupBitmapData<T> & bitmap_data
|
||||
= *reinterpret_cast<AggregateFunctionGroupBitmapData<T> *>(col_to->getData()[i]);
|
||||
for (; pos < offsets[i]; ++pos)
|
||||
{
|
||||
bitmap_data.rbs.add(input_data[pos]);
|
||||
}
|
||||
}
|
||||
block.getByPosition(result).column = std::move(col_to);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Name>
|
||||
class FunctionBitmapToArrayImpl : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
|
||||
static FunctionPtr create(const Context &) { return std::make_shared<FunctionBitmapToArrayImpl>(); }
|
||||
|
||||
String getName() const override { return name; }
|
||||
|
||||
bool isVariadic() const override { return false; }
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
const DataTypeAggregateFunction * bitmap_type = typeid_cast<const DataTypeAggregateFunction *>(arguments[0].get());
|
||||
if (!(bitmap_type && bitmap_type->getFunctionName() == AggregateFunctionGroupBitmapData<UInt32>::name()))
|
||||
throw Exception(
|
||||
"First argument for function " + getName() + " must be an bitmap but it has type " + arguments[0]->getName() + ".",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
const DataTypePtr data_type = bitmap_type->getArgumentsDataTypes()[0];
|
||||
|
||||
return std::make_shared<DataTypeArray>(data_type);
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override
|
||||
{
|
||||
// input data
|
||||
const auto & return_type = block.getByPosition(result).type;
|
||||
auto res_ptr = return_type->createColumn();
|
||||
ColumnArray & res = static_cast<ColumnArray &>(*res_ptr);
|
||||
|
||||
IColumn & res_data = res.getData();
|
||||
ColumnArray::Offsets & res_offsets = res.getOffsets();
|
||||
|
||||
const IDataType * from_type = block.getByPosition(arguments[0]).type.get();
|
||||
const DataTypeAggregateFunction * aggr_type = typeid_cast<const DataTypeAggregateFunction *>(from_type);
|
||||
WhichDataType which(aggr_type->getArgumentsDataTypes()[0]);
|
||||
if (which.isUInt8())
|
||||
executeIntType<UInt8>(block, arguments, input_rows_count, res_data, res_offsets);
|
||||
else if (which.isUInt16())
|
||||
executeIntType<UInt16>(block, arguments, input_rows_count, res_data, res_offsets);
|
||||
else if (which.isUInt32())
|
||||
executeIntType<UInt32>(block, arguments, input_rows_count, res_data, res_offsets);
|
||||
else if (which.isUInt64())
|
||||
executeIntType<UInt64>(block, arguments, input_rows_count, res_data, res_offsets);
|
||||
else
|
||||
throw Exception(
|
||||
"Unexpected type " + from_type->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
block.getByPosition(result).column = std::move(res_ptr);
|
||||
}
|
||||
|
||||
private:
|
||||
using ToType = UInt64;
|
||||
|
||||
template <typename T>
|
||||
void executeIntType(
|
||||
Block & block, const ColumnNumbers & arguments, size_t input_rows_count, IColumn & res_data_col, ColumnArray::Offsets & res_offsets)
|
||||
const
|
||||
{
|
||||
const ColumnAggregateFunction * column
|
||||
= typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(arguments[0]).column.get());
|
||||
|
||||
PaddedPODArray<T> & res_data = typeid_cast<ColumnVector<T> &>(res_data_col).getData();
|
||||
ColumnArray::Offset res_offset = 0;
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
{
|
||||
const AggregateFunctionGroupBitmapData<T> & bd1
|
||||
= *reinterpret_cast<const AggregateFunctionGroupBitmapData<T> *>(column->getData()[i]);
|
||||
UInt64 count = bd1.rbs.rb_to_array(res_data);
|
||||
res_offset += count;
|
||||
res_offsets.emplace_back(res_offset);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Name>
|
||||
class FunctionBitmapSelfCardinalityImpl : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
|
||||
static FunctionPtr create(const Context &) { return std::make_shared<FunctionBitmapSelfCardinalityImpl>(); }
|
||||
|
||||
String getName() const override { return name; }
|
||||
|
||||
bool isVariadic() const override { return false; }
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
auto bitmap_type = typeid_cast<const DataTypeAggregateFunction *>(arguments[0].get());
|
||||
if (!(bitmap_type && bitmap_type->getFunctionName() == AggregateFunctionGroupBitmapData<UInt32>::name()))
|
||||
throw Exception(
|
||||
"First argument for function " + getName() + " must be an bitmap but it has type " + arguments[0]->getName() + ".",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
return std::make_shared<DataTypeNumber<ToType>>();
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override
|
||||
{
|
||||
auto col_to = ColumnVector<ToType>::create(input_rows_count);
|
||||
typename ColumnVector<ToType>::Container & vec_to = col_to->getData();
|
||||
const IDataType * from_type = block.getByPosition(arguments[0]).type.get();
|
||||
|
||||
const DataTypeAggregateFunction * aggr_type = typeid_cast<const DataTypeAggregateFunction *>(from_type);
|
||||
WhichDataType which(aggr_type->getArgumentsDataTypes()[0]);
|
||||
if (which.isUInt8())
|
||||
executeIntType<UInt8>(block, arguments, input_rows_count, vec_to);
|
||||
else if (which.isUInt16())
|
||||
executeIntType<UInt16>(block, arguments, input_rows_count, vec_to);
|
||||
else if (which.isUInt32())
|
||||
executeIntType<UInt32>(block, arguments, input_rows_count, vec_to);
|
||||
else if (which.isUInt64())
|
||||
executeIntType<UInt64>(block, arguments, input_rows_count, vec_to);
|
||||
else
|
||||
throw Exception(
|
||||
"Unexpected type " + from_type->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
block.getByPosition(result).column = std::move(col_to);
|
||||
}
|
||||
|
||||
private:
|
||||
using ToType = UInt64;
|
||||
|
||||
template <typename T>
|
||||
void executeIntType(
|
||||
Block & block, const ColumnNumbers & arguments, size_t input_rows_count, typename ColumnVector<ToType>::Container & vec_to)
|
||||
{
|
||||
const ColumnAggregateFunction * column
|
||||
= typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(arguments[0]).column.get());
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
{
|
||||
const AggregateFunctionGroupBitmapData<T> & bd1
|
||||
= *reinterpret_cast<const AggregateFunctionGroupBitmapData<T> *>(column->getData()[i]);
|
||||
vec_to[i] = bd1.rbs.size();
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct BitmapAndCardinalityImpl
|
||||
{
|
||||
using ReturnType = UInt64;
|
||||
static UInt64 apply(const AggregateFunctionGroupBitmapData<T> & bd1, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
// roaring_bitmap_and_cardinality( rb1, rb2 );
|
||||
return bd1.rbs.rb_and_cardinality(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <typename T>
|
||||
struct BitmapOrCardinalityImpl
|
||||
{
|
||||
using ReturnType = UInt64;
|
||||
static UInt64 apply(const AggregateFunctionGroupBitmapData<T> & bd1, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
// return roaring_bitmap_or_cardinality( rb1, rb2 );
|
||||
return bd1.rbs.rb_or_cardinality(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct BitmapXorCardinalityImpl
|
||||
{
|
||||
using ReturnType = UInt64;
|
||||
static UInt64 apply(const AggregateFunctionGroupBitmapData<T> & bd1, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
// return roaring_bitmap_xor_cardinality( rb1, rb2 );
|
||||
return bd1.rbs.rb_xor_cardinality(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct BitmapAndnotCardinalityImpl
|
||||
{
|
||||
using ReturnType = UInt64;
|
||||
static UInt64 apply(const AggregateFunctionGroupBitmapData<T> & bd1, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
// roaring_bitmap_andnot_cardinality( rb1, rb2 );
|
||||
return bd1.rbs.rb_andnot_cardinality(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
template <template <typename> class Impl, typename Name>
|
||||
class FunctionBitmapCardinality : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
|
||||
static FunctionPtr create(const Context &) { return std::make_shared<FunctionBitmapCardinality>(); }
|
||||
|
||||
String getName() const override { return name; }
|
||||
|
||||
bool isVariadic() const override { return false; }
|
||||
|
||||
size_t getNumberOfArguments() const override { return 2; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
auto bitmap_type0 = typeid_cast<const DataTypeAggregateFunction *>(arguments[0].get());
|
||||
if (!(bitmap_type0 && bitmap_type0->getFunctionName() == AggregateFunctionGroupBitmapData<UInt32>::name()))
|
||||
throw Exception(
|
||||
"First argument for function " + getName() + " must be an bitmap but it has type " + arguments[0]->getName() + ".",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
auto bitmap_type1 = typeid_cast<const DataTypeAggregateFunction *>(arguments[1].get());
|
||||
if (!(bitmap_type1 && bitmap_type1->getFunctionName() == AggregateFunctionGroupBitmapData<UInt32>::name()))
|
||||
throw Exception(
|
||||
"Second argument for function " + getName() + " must be an bitmap but it has type " + arguments[1]->getName() + ".",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
return std::make_shared<DataTypeNumber<ToType>>();
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override
|
||||
{
|
||||
auto col_to = ColumnVector<ToType>::create(input_rows_count);
|
||||
typename ColumnVector<ToType>::Container & vec_to = col_to->getData();
|
||||
const IDataType * from_type = block.getByPosition(arguments[0]).type.get();
|
||||
|
||||
const DataTypeAggregateFunction * aggr_type = typeid_cast<const DataTypeAggregateFunction *>(from_type);
|
||||
WhichDataType which(aggr_type->getArgumentsDataTypes()[0]);
|
||||
if (which.isUInt8())
|
||||
executeIntType<UInt8>(block, arguments, input_rows_count, vec_to);
|
||||
else if (which.isUInt16())
|
||||
executeIntType<UInt16>(block, arguments, input_rows_count, vec_to);
|
||||
else if (which.isUInt32())
|
||||
executeIntType<UInt32>(block, arguments, input_rows_count, vec_to);
|
||||
else if (which.isUInt64())
|
||||
executeIntType<UInt64>(block, arguments, input_rows_count, vec_to);
|
||||
else
|
||||
throw Exception(
|
||||
"Unexpected type " + from_type->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
block.getByPosition(result).column = std::move(col_to);
|
||||
}
|
||||
|
||||
private:
|
||||
using ToType = UInt64;
|
||||
|
||||
template <typename T>
|
||||
void executeIntType(
|
||||
Block & block, const ColumnNumbers & arguments, size_t input_rows_count, typename ColumnVector<ToType>::Container & vec_to)
|
||||
{
|
||||
const ColumnAggregateFunction * columns[2];
|
||||
for (size_t i = 0; i < 2; ++i)
|
||||
columns[i] = typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(arguments[i]).column.get());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
{
|
||||
const AggregateFunctionGroupBitmapData<T> & bd1
|
||||
= *reinterpret_cast<const AggregateFunctionGroupBitmapData<T> *>(columns[0]->getData()[i]);
|
||||
const AggregateFunctionGroupBitmapData<T> & bd2
|
||||
= *reinterpret_cast<const AggregateFunctionGroupBitmapData<T> *>(columns[1]->getData()[i]);
|
||||
vec_to[i] = Impl<T>::apply(bd1, bd2);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct BitmapAndImpl
|
||||
{
|
||||
static void apply(AggregateFunctionGroupBitmapData<T> & toBd, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
toBd.rbs.rb_and(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct BitmapOrImpl
|
||||
{
|
||||
static void apply(AggregateFunctionGroupBitmapData<T> & toBd, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
toBd.rbs.rb_or(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct BitmapXorImpl
|
||||
{
|
||||
static void apply(AggregateFunctionGroupBitmapData<T> & toBd, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
toBd.rbs.rb_xor(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
struct BitmapAndnotImpl
|
||||
{
|
||||
static void apply(AggregateFunctionGroupBitmapData<T> & toBd, const AggregateFunctionGroupBitmapData<T> & bd2)
|
||||
{
|
||||
toBd.rbs.rb_andnot(bd2.rbs);
|
||||
}
|
||||
};
|
||||
|
||||
template <template <typename> class Impl, typename Name>
|
||||
class FunctionBitmap : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = Name::name;
|
||||
|
||||
static FunctionPtr create(const Context &) { return std::make_shared<FunctionBitmap>(); }
|
||||
|
||||
String getName() const override { return name; }
|
||||
|
||||
bool isVariadic() const override { return false; }
|
||||
|
||||
size_t getNumberOfArguments() const override { return 2; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
auto bitmap_type0 = typeid_cast<const DataTypeAggregateFunction *>(arguments[0].get());
|
||||
if (!(bitmap_type0 && bitmap_type0->getFunctionName() == AggregateFunctionGroupBitmapData<UInt32>::name()))
|
||||
throw Exception(
|
||||
"First argument for function " + getName() + " must be an bitmap but it has type " + arguments[0]->getName() + ".",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
auto bitmap_type1 = typeid_cast<const DataTypeAggregateFunction *>(arguments[1].get());
|
||||
if (!(bitmap_type1 && bitmap_type1->getFunctionName() == AggregateFunctionGroupBitmapData<UInt32>::name()))
|
||||
throw Exception(
|
||||
"Second argument for function " + getName() + " must be an bitmap but it has type " + arguments[1]->getName() + ".",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
return arguments[0];
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override
|
||||
{
|
||||
const IDataType * from_type = block.getByPosition(arguments[0]).type.get();
|
||||
const DataTypeAggregateFunction * aggr_type = typeid_cast<const DataTypeAggregateFunction *>(from_type);
|
||||
WhichDataType which(aggr_type->getArgumentsDataTypes()[0]);
|
||||
if (which.isUInt8())
|
||||
executeBitmapData<UInt8>(block, arguments, result, input_rows_count);
|
||||
else if (which.isUInt16())
|
||||
executeBitmapData<UInt16>(block, arguments, result, input_rows_count);
|
||||
else if (which.isUInt32())
|
||||
executeBitmapData<UInt32>(block, arguments, result, input_rows_count);
|
||||
else if (which.isUInt64())
|
||||
executeBitmapData<UInt64>(block, arguments, result, input_rows_count);
|
||||
else
|
||||
throw Exception(
|
||||
"Unexpected type " + from_type->getName() + " of argument of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
}
|
||||
|
||||
private:
|
||||
template <typename T>
|
||||
void executeBitmapData(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count)
|
||||
{
|
||||
const ColumnAggregateFunction * columns[2];
|
||||
for (size_t i = 0; i < 2; ++i)
|
||||
columns[i] = typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(arguments[i]).column.get());
|
||||
|
||||
auto col_to = ColumnAggregateFunction::create(columns[0]->getAggregateFunction());
|
||||
|
||||
col_to->reserve(input_rows_count);
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
{
|
||||
col_to->insertFrom(columns[0]->getData()[i]);
|
||||
AggregateFunctionGroupBitmapData<T> & toBd = *reinterpret_cast<AggregateFunctionGroupBitmapData<T> *>(col_to->getData()[i]);
|
||||
const AggregateFunctionGroupBitmapData<T> & bd2
|
||||
= *reinterpret_cast<const AggregateFunctionGroupBitmapData<T> *>(columns[1]->getData()[i]);
|
||||
Impl<T>::apply(toBd, bd2);
|
||||
}
|
||||
block.getByPosition(result).column = std::move(col_to);
|
||||
}
|
||||
};
|
||||
|
||||
struct NameBitmapBuild
|
||||
{
|
||||
static constexpr auto name = "bitmapBuild";
|
||||
};
|
||||
using FunctionBitmapBuild = FunctionBitmapBuildImpl<NameBitmapBuild>;
|
||||
|
||||
struct NameBitmapToArray
|
||||
{
|
||||
static constexpr auto name = "bitmapToArray";
|
||||
};
|
||||
using FunctionBitmapToArray = FunctionBitmapToArrayImpl<NameBitmapToArray>;
|
||||
|
||||
struct NameBitmapCardinality
|
||||
{
|
||||
static constexpr auto name = "bitmapCardinality";
|
||||
};
|
||||
struct NameBitmapAndCardinality
|
||||
{
|
||||
static constexpr auto name = "bitmapAndCardinality";
|
||||
};
|
||||
struct NameBitmapOrCardinality
|
||||
{
|
||||
static constexpr auto name = "bitmapOrCardinality";
|
||||
};
|
||||
struct NameBitmapXorCardinality
|
||||
{
|
||||
static constexpr auto name = "bitmapXorCardinality";
|
||||
};
|
||||
struct NameBitmapAndnotCardinality
|
||||
{
|
||||
static constexpr auto name = "bitmapAndnotCardinality";
|
||||
};
|
||||
|
||||
using FunctionBitmapSelfCardinality = FunctionBitmapSelfCardinalityImpl<NameBitmapCardinality>;
|
||||
using FunctionBitmapAndCardinality = FunctionBitmapCardinality<BitmapAndCardinalityImpl, NameBitmapAndCardinality>;
|
||||
using FunctionBitmapOrCardinality = FunctionBitmapCardinality<BitmapOrCardinalityImpl, NameBitmapOrCardinality>;
|
||||
using FunctionBitmapXorCardinality = FunctionBitmapCardinality<BitmapXorCardinalityImpl, NameBitmapXorCardinality>;
|
||||
using FunctionBitmapAndnotCardinality = FunctionBitmapCardinality<BitmapAndnotCardinalityImpl, NameBitmapAndnotCardinality>;
|
||||
|
||||
struct NameBitmapAnd
|
||||
{
|
||||
static constexpr auto name = "bitmapAnd";
|
||||
};
|
||||
struct NameBitmapOr
|
||||
{
|
||||
static constexpr auto name = "bitmapOr";
|
||||
};
|
||||
struct NameBitmapXor
|
||||
{
|
||||
static constexpr auto name = "bitmapXor";
|
||||
};
|
||||
struct NameBitmapAndnot
|
||||
{
|
||||
static constexpr auto name = "bitmapAndnot";
|
||||
};
|
||||
using FunctionBitmapAnd = FunctionBitmap<BitmapAndImpl, NameBitmapAnd>;
|
||||
using FunctionBitmapOr = FunctionBitmap<BitmapOrImpl, NameBitmapOr>;
|
||||
using FunctionBitmapXor = FunctionBitmap<BitmapXorImpl, NameBitmapXor>;
|
||||
using FunctionBitmapAndnot = FunctionBitmap<BitmapAndnotImpl, NameBitmapAndnot>;
|
||||
|
||||
|
||||
}
|
@ -13,6 +13,7 @@ namespace DB
|
||||
void registerFunctionsArithmetic(FunctionFactory &);
|
||||
void registerFunctionsArray(FunctionFactory &);
|
||||
void registerFunctionsTuple(FunctionFactory &);
|
||||
void registerFunctionsBitmap(FunctionFactory &);
|
||||
void registerFunctionsCoding(FunctionFactory &);
|
||||
void registerFunctionsComparison(FunctionFactory &);
|
||||
void registerFunctionsConditional(FunctionFactory &);
|
||||
@ -53,6 +54,7 @@ void registerFunctions()
|
||||
registerFunctionsArithmetic(factory);
|
||||
registerFunctionsArray(factory);
|
||||
registerFunctionsTuple(factory);
|
||||
registerFunctionsBitmap(factory);
|
||||
registerFunctionsCoding(factory);
|
||||
registerFunctionsComparison(factory);
|
||||
registerFunctionsConditional(factory);
|
||||
|
@ -0,0 +1,17 @@
|
||||
[1,2,3,4,5]
|
||||
[3]
|
||||
[1,2,3,4,5]
|
||||
[1,2,4,5]
|
||||
[1,2]
|
||||
5
|
||||
1
|
||||
5
|
||||
4
|
||||
2
|
||||
70
|
||||
2019-01-01 50 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]
|
||||
2019-01-02 60 [11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70]
|
||||
60 50 70 40 20 30
|
||||
60 50 70 40 20 30
|
||||
2019-01-01 50
|
||||
2019-01-02 60
|
77
dbms/tests/queries/0_stateless/00829_bitmap_function.sql
Normal file
77
dbms/tests/queries/0_stateless/00829_bitmap_function.sql
Normal file
@ -0,0 +1,77 @@
|
||||
SELECT bitmapToArray(bitmapBuild([1, 2, 3, 4, 5]));
|
||||
SELECT bitmapToArray(bitmapAnd(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])));
|
||||
SELECT bitmapToArray(bitmapOr(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])));
|
||||
SELECT bitmapToArray(bitmapXor(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])));
|
||||
SELECT bitmapToArray(bitmapAndnot(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])));
|
||||
SELECT bitmapCardinality(bitmapBuild([1, 2, 3, 4, 5]));
|
||||
SELECT bitmapAndCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
|
||||
SELECT bitmapOrCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
|
||||
SELECT bitmapXorCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
|
||||
SELECT bitmapAndnotCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]));
|
||||
|
||||
DROP TABLE IF EXISTS test.bitmap_test;
|
||||
CREATE TABLE test.bitmap_test(pickup_date Date, city_id UInt32, uid UInt32)ENGINE = Memory;
|
||||
INSERT INTO test.bitmap_test SELECT '2019-01-01', 1, number FROM numbers(1,50);
|
||||
INSERT INTO test.bitmap_test SELECT '2019-01-02', 1, number FROM numbers(11,60);
|
||||
|
||||
|
||||
SELECT groupBitmap( uid ) AS user_num FROM test.bitmap_test;
|
||||
|
||||
SELECT pickup_date, groupBitmap( uid ) AS user_num, bitmapToArray(groupBitmapState( uid )) AS users FROM test.bitmap_test GROUP BY pickup_date;
|
||||
|
||||
SELECT
|
||||
bitmapCardinality(day_today) AS today_users,
|
||||
bitmapCardinality(day_before) AS before_users,
|
||||
bitmapOrCardinality(day_today, day_before) AS all_users,
|
||||
bitmapAndCardinality(day_today, day_before) AS old_users,
|
||||
bitmapAndnotCardinality(day_today, day_before) AS new_users,
|
||||
bitmapXorCardinality(day_today, day_before) AS diff_users
|
||||
FROM
|
||||
(
|
||||
SELECT city_id, groupBitmapState( uid ) AS day_today FROM test.bitmap_test WHERE pickup_date = '2019-01-02' GROUP BY city_id
|
||||
)
|
||||
ALL LEFT JOIN
|
||||
(
|
||||
SELECT city_id, groupBitmapState( uid ) AS day_before FROM test.bitmap_test WHERE pickup_date = '2019-01-01' GROUP BY city_id
|
||||
)
|
||||
USING city_id;
|
||||
|
||||
SELECT
|
||||
bitmapCardinality(day_today) AS today_users,
|
||||
bitmapCardinality(day_before) AS before_users,
|
||||
bitmapCardinality(bitmapOr(day_today, day_before))ll_users,
|
||||
bitmapCardinality(bitmapAnd(day_today, day_before)) AS old_users,
|
||||
bitmapCardinality(bitmapAndnot(day_today, day_before)) AS new_users,
|
||||
bitmapCardinality(bitmapXor(day_today, day_before)) AS diff_users
|
||||
FROM
|
||||
(
|
||||
SELECT city_id, groupBitmapState( uid ) AS day_today FROM test.bitmap_test WHERE pickup_date = '2019-01-02' GROUP BY city_id
|
||||
)
|
||||
ALL LEFT JOIN
|
||||
(
|
||||
SELECT city_id, groupBitmapState( uid ) AS day_before FROM test.bitmap_test WHERE pickup_date = '2019-01-01' GROUP BY city_id
|
||||
)
|
||||
USING city_id;
|
||||
|
||||
|
||||
DROP TABLE IF EXISTS test.bitmap_state_test;
|
||||
CREATE TABLE test.bitmap_state_test
|
||||
(
|
||||
pickup_date Date,
|
||||
city_id UInt32,
|
||||
uv AggregateFunction( groupBitmap, UInt32 )
|
||||
)
|
||||
ENGINE = AggregatingMergeTree( pickup_date, ( pickup_date, city_id ), 8192);
|
||||
|
||||
INSERT INTO test.bitmap_state_test SELECT
|
||||
pickup_date,
|
||||
city_id,
|
||||
groupBitmapState(uid) AS uv
|
||||
FROM test.bitmap_test
|
||||
GROUP BY pickup_date, city_id;
|
||||
|
||||
SELECT pickup_date, groupBitmapMerge(uv) AS users from test.bitmap_state_test group by pickup_date;
|
||||
|
||||
DROP TABLE IF EXISTS test.bitmap_test;
|
||||
DROP TABLE IF EXISTS test.bitmap_state_test;
|
||||
|
@ -179,6 +179,48 @@ binary decimal
|
||||
01101000 = 104
|
||||
```
|
||||
|
||||
|
||||
##groupBitmap
|
||||
|
||||
Bitmap or Aggregate calculations from a unsigned integer column, return cardinality of type UInt64, if add suffix -State, then return [bitmap object](../functions/bitmap_functions.md).
|
||||
|
||||
```
|
||||
groupBitmap(expr)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
`expr` – An expression that results in `UInt*` type.
|
||||
|
||||
**Return value**
|
||||
|
||||
Value of the `UInt64` type.
|
||||
|
||||
**Example**
|
||||
|
||||
Test data:
|
||||
|
||||
```
|
||||
userid
|
||||
1
|
||||
1
|
||||
2
|
||||
3
|
||||
```
|
||||
|
||||
Query:
|
||||
|
||||
```
|
||||
SELECT groupBitmap(userid) as num FROM t
|
||||
```
|
||||
|
||||
Result:
|
||||
|
||||
```
|
||||
num
|
||||
3
|
||||
```
|
||||
|
||||
## min(x) {#agg_function-min}
|
||||
|
||||
Calculates the minimum.
|
||||
|
277
docs/en/query_language/functions/bitmap_functions.md
Normal file
277
docs/en/query_language/functions/bitmap_functions.md
Normal file
@ -0,0 +1,277 @@
|
||||
# Bitmap functions
|
||||
|
||||
Bitmap functions work for two bitmaps Object value calculation, it is to return new bitmap or cardinality while using formula calculation, such as and, or, xor, and not, etc.
|
||||
|
||||
There are 2 kinds of construction methods for Bitmap Object. One is to be constructed by aggregation function groupBitmap with -State, the other is to be constructed by Array Object. It is also to convert Bitmap Object to Array Object.
|
||||
|
||||
RoaringBitmap is wrapped into a data structure while actual storage of Bitmap objects. When the cardinality is less than or equal to 32, it uses Set objet. When the cardinality is greater than 32, it uses RoaringBitmap object. That is why storage of low cardinality set is faster.
|
||||
|
||||
For more information on RoaringBitmap, see: [CRoaring](https://github.com/RoaringBitmap/CRoaring).
|
||||
|
||||
|
||||
## bitmapBuild
|
||||
|
||||
Build a bitmap from unsigned integer array.
|
||||
|
||||
```
|
||||
bitmapBuild(array)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `array` – unsigned integer array.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapBuild([1, 2, 3, 4, 5]) AS res
|
||||
```
|
||||
|
||||
## bitmapToArray
|
||||
|
||||
Convert bitmap to integer array.
|
||||
|
||||
```
|
||||
bitmapToArray(bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapToArray(bitmapBuild([1, 2, 3, 4, 5])) AS res
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─────────┐
|
||||
│ [1,2,3,4,5] │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
|
||||
## bitmapAnd
|
||||
|
||||
Two bitmap and calculation, the result is a new bitmap.
|
||||
|
||||
```
|
||||
bitmapAnd(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapToArray(bitmapAnd(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─┐
|
||||
│ [3] │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
|
||||
## bitmapOr
|
||||
|
||||
Two bitmap or calculation, the result is a new bitmap.
|
||||
|
||||
```
|
||||
bitmapOr(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapToArray(bitmapOr(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─────────┐
|
||||
│ [1,2,3,4,5] │
|
||||
└─────────────┘
|
||||
```
|
||||
|
||||
## bitmapXor
|
||||
|
||||
Two bitmap xor calculation, the result is a new bitmap.
|
||||
|
||||
```
|
||||
bitmapXor(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapToArray(bitmapXor(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res
|
||||
```
|
||||
|
||||
```
|
||||
┌─res───────┐
|
||||
│ [1,2,4,5] │
|
||||
└───────────┘
|
||||
```
|
||||
|
||||
## bitmapAndnot
|
||||
|
||||
Two bitmap andnot calculation, the result is a new bitmap.
|
||||
|
||||
```
|
||||
bitmapAndnot(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapToArray(bitmapAndnot(bitmapBuild([1,2,3]),bitmapBuild([3,4,5]))) AS res
|
||||
```
|
||||
|
||||
```
|
||||
┌─res───┐
|
||||
│ [1,2] │
|
||||
└───────┘
|
||||
```
|
||||
|
||||
## bitmapCardinality
|
||||
|
||||
Retrun bitmap cardinality of type UInt64.
|
||||
|
||||
|
||||
```
|
||||
bitmapCardinality(bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapCardinality(bitmapBuild([1, 2, 3, 4, 5])) AS res
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─┐
|
||||
│ 5 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
## bitmapAndCardinality
|
||||
|
||||
Two bitmap and calculation, return cardinality of type UInt64.
|
||||
|
||||
|
||||
```
|
||||
bitmapAndCardinality(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapAndCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─┐
|
||||
│ 1 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
|
||||
## bitmapOrCardinality
|
||||
|
||||
Two bitmap or calculation, return cardinality of type UInt64.
|
||||
|
||||
```
|
||||
bitmapOrCardinality(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapOrCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─┐
|
||||
│ 5 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
## bitmapXorCardinality
|
||||
|
||||
Two bitmap xor calculation, return cardinality of type UInt64.
|
||||
|
||||
```
|
||||
bitmapXorCardinality(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapXorCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─┐
|
||||
│ 4 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
|
||||
## bitmapAndnotCardinality
|
||||
|
||||
Two bitmap andnot calculation, return cardinality of type UInt64.
|
||||
|
||||
```
|
||||
bitmapAndnotCardinality(bitmap,bitmap)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
- `bitmap` – bitmap object.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
SELECT bitmapAndnotCardinality(bitmapBuild([1,2,3]),bitmapBuild([3,4,5])) AS res;
|
||||
```
|
||||
|
||||
```
|
||||
┌─res─┐
|
||||
│ 2 │
|
||||
└─────┘
|
||||
```
|
||||
|
||||
|
||||
[Original article](https://clickhouse.yandex/docs/en/query_language/functions/bitmap_functions/) <!--hide-->
|
@ -86,6 +86,7 @@ functions/arithmetic_functions.md query_language/functions/arithmetic_functions.
|
||||
functions/array_functions.md query_language/functions/array_functions.md
|
||||
functions/array_join.md query_language/functions/array_join.md
|
||||
functions/bit_functions.md query_language/functions/bit_functions.md
|
||||
functions/bitmap_functions.md query_language/functions/bitmap_functions.md
|
||||
functions/comparison_functions.md query_language/functions/comparison_functions.md
|
||||
functions/conditional_functions.md query_language/functions/conditional_functions.md
|
||||
functions/date_time_functions.md query_language/functions/date_time_functions.md
|
||||
|
@ -115,6 +115,7 @@ nav:
|
||||
- 'Working with Arrays': 'query_language/functions/array_functions.md'
|
||||
- 'Splitting and Merging Strings and Arrays': 'query_language/functions/splitting_merging_functions.md'
|
||||
- 'Bit': 'query_language/functions/bit_functions.md'
|
||||
- 'Bitmap functions': 'query_language/functions/bitmap_functions.md'
|
||||
- 'Hash': 'query_language/functions/hash_functions.md'
|
||||
- 'Generating Pseudo-Random Numbers': 'query_language/functions/random_functions.md'
|
||||
- 'Encoding': 'query_language/functions/encoding_functions.md'
|
||||
|
Loading…
Reference in New Issue
Block a user