ClickHouse/dbms/src/Storages/MergeTree/IMergeTreeReader.h

#pragma once

#include <Core/NamesAndTypes.h>
#include <Storages/MergeTree/MergeTreeReaderStream.h>
#include <Storages/MergeTree/MergeTreeBlockReadUtils.h>


namespace DB
{

class IDataType;

/// Reads the data between pairs of marks in the same part. When reading consecutive ranges, avoids unnecessary seeks.
/// When ranges are almost consecutive, seeks are fast because they are performed inside the buffer.
/// Avoids loading the marks file if it is not needed (e.g. when reading the whole part).
class IMergeTreeReader : private boost::noncopyable
{
public:
    using ValueSizeMap = std::map<std::string, double>;
    using DeserializeBinaryBulkStateMap = std::map<std::string, IDataType::DeserializeBinaryBulkStatePtr>;

    IMergeTreeReader(const MergeTreeData::DataPartPtr & data_part_,
        const NamesAndTypesList & columns_,
        UncompressedCache * uncompressed_cache_,
        MarkCache * mark_cache_,
        const MarkRanges & all_mark_ranges_,
        const MergeTreeReaderSettings & settings_,
        const ValueSizeMap & avg_value_size_hints_ = ValueSizeMap{});

    /// Return the number of rows has been read or zero if there is no columns to read.
    /// If continue_reading is true, continue reading from last state, otherwise seek to from_mark
    virtual size_t readRows(size_t from_mark, bool continue_reading, size_t max_rows_to_read, Block & res) = 0;

    virtual bool canReadIncompleteGranules() const = 0;

    virtual ~IMergeTreeReader();

    const ValueSizeMap & getAvgValueSizeHints() const;

    /// Add columns from ordered_names that are not present in the block.
    /// Missing columns are added in the order specified by ordered_names.
    /// num_rows is needed in case if all res_columns are nullptr.
    void fillMissingColumns(Columns & res_columns, bool & should_evaluate_missing_defaults, size_t num_rows);
    /// Evaluate defaulted columns if necessary.
    void evaluateMissingDefaults(Block additional_columns, Columns & res_columns);

    const NamesAndTypesList & getColumns() const { return columns; }
    size_t numColumnsInResult() const { return columns.size(); }

    size_t getFirstMarkToRead() const
    {
        return all_mark_ranges.back().begin;
    }

    MergeTreeData::DataPartPtr data_part;

protected:

    /// avg_value_size_hints are used to reduce the number of reallocations when creating columns of variable size.
    ValueSizeMap avg_value_size_hints;
    /// Stores states for IDataType::deserializeBinaryBulk
    DeserializeBinaryBulkStateMap deserialize_binary_bulk_state_map;
    /// Path to the directory containing the part
    String path;

    /// Columns that are read.
    NamesAndTypesList columns;

    UncompressedCache * uncompressed_cache;
    MarkCache * mark_cache;
    /// If save_marks_in_cache is false, then, if marks are not in cache, we will load them but won't save in the cache, to avoid evicting other data.

    MergeTreeReaderSettings settings;

    const MergeTreeData & storage;
    MarkRanges all_mark_ranges;

    friend class MergeTreeRangeReader::DelayedStream;
};

}
Merge 2013-11-26 11:55:11 +00:00			`#pragma once`

Moved headers and sources to same place [#CLICKHOUSE-3]. 2017-04-01 09:19:00 +00:00			`#include <Core/NamesAndTypes.h>`
Data Skipping Indices (#4143) * made index parser * added index parsing * some fixes * added index interface and factory * fixed compilation * ptrs * added indexParts * indextypes * index condition * IndexCondition * added indexes in selectexecutor * fix * changed comment * fix * added granularity * comments * fix * fix * added writing indexes * removed indexpart class * fix * added setSkipIndexes * add rw for MergeTreeIndexes * fixes * upd error * fix * fix * reading * test index * fixed nullptr error * fixed * fix * unique names * asts -> exprlist * minmax index * fix * fixed select * fixed merging * fixed mutation * working minmax * removed test index * fixed style * added indexes to checkDataPart * added tests for minmax index * fixed constructor * fix style * fixed includes * fixed setSkipIndexes * added indexes meta to zookeeper * added parsing * removed throw * alter cmds parse * fix * added alter * fix * alters fix * fix alters * fix "after" * fixed alter * alter fix + test * fixes * upd setSkipIndexes * fixed alter bug with drop all indices * fix metadata editing * new test and repl fix * rm test files * fixed repl alter * fix * fix * indices * MTReadStream * upd test for bug * fix * added useful parsers and ast classes * fix * fix comments * replaced columns * fix * fixed parsing * fixed printing * fix err * basic IndicesDescription * go to IndicesDescr * moved indices * go to indicesDescr * fix test minmax_index* * fixed MT alter * fixed bug with replMT indices storing in zk * rename * refactoring * docs ru * docs ru * docs en * refactor * rename tests * fix docs * refactoring * fix * fix * fix * fixed style * unique idx * unique * fix * better minmax calculation * upd * added getBlock * unique_condition * added termForAST * unique * fixed not * uniqueCondition::mayBeTrueOnGranule * fix * fixed bug with double column * is always true * fix * key set * spaces * test * tests * fix * unique * fix * fix * fixed bug with duplicate column * removed unused data * fix * fixes * __bitSwapLastTwo * fix 2019-02-05 14:50:25 +00:00			`#include <Storages/MergeTree/MergeTreeReaderStream.h>`
polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`#include <Storages/MergeTree/MergeTreeBlockReadUtils.h>`
Merge 2013-11-26 11:55:11 +00:00

			`namespace DB`
			`{`

Better [#METR-2944]. 2016-11-20 12:43:20 +00:00			`class IDataType;`
translate comments [#CLICKHOUSE-3] 2017-01-24 17:25:47 +00:00
			`/// Reads the data between pairs of marks in the same part. When reading consecutive ranges, avoids unnecessary seeks.`
			`/// When ranges are almost consecutive, seeks are fast because they are performed inside the buffer.`
			`/// Avoids loading the marks file if it is not needed (e.g. when reading the whole part).`
polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`class IMergeTreeReader : private boost::noncopyable`
Merge 2013-11-26 11:55:11 +00:00			`{`
			`public:`
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`using ValueSizeMap = std::map<std::string, double>;`
Added DeserializeBinaryBulkState which stores reading state for ColumnWithDictionary. 2018-05-21 16:21:15 +00:00			`using DeserializeBinaryBulkStateMap = std::map<std::string, IDataType::DeserializeBinaryBulkStatePtr>;`
dbms: added backoff on slow reads [#METR-17579]. 2015-12-13 04:52:13 +00:00
polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`IMergeTreeReader(const MergeTreeData::DataPartPtr & data_part_,`
			`const NamesAndTypesList & columns_,`
adding -Wshadow for GCC 2019-08-03 11:02:40 +00:00			`UncompressedCache * uncompressed_cache_,`
			`MarkCache * mark_cache_,`
polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`const MarkRanges & all_mark_ranges_,`
polymorphic parts (development) cleanup 2019-12-18 15:54:45 +00:00			`const MergeTreeReaderSettings & settings_,`
polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`const ValueSizeMap & avg_value_size_hints_ = ValueSizeMap{});`
Merge 2014-07-23 15:24:45 +00:00
polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`/// Return the number of rows has been read or zero if there is no columns to read.`
			`/// If continue_reading is true, continue reading from last state, otherwise seek to from_mark`
			`virtual size_t readRows(size_t from_mark, bool continue_reading, size_t max_rows_to_read, Block & res) = 0;`

polymorphic parts (development) fix adjust last granule 2019-12-02 17:10:22 +00:00			`virtual bool canReadIncompleteGranules() const = 0;`

polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`virtual ~IMergeTreeReader();`
Merge 2013-11-26 11:55:11 +00:00
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`const ValueSizeMap & getAvgValueSizeHints() const;`
Merge 2015-09-16 17:49:08 +00:00
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`/// Add columns from ordered_names that are not present in the block.`
			`/// Missing columns are added in the order specified by ordered_names.`
Remove Block from RangeReader. 2019-09-23 19:22:02 +00:00			`/// num_rows is needed in case if all res_columns are nullptr.`
			`void fillMissingColumns(Columns & res_columns, bool & should_evaluate_missing_defaults, size_t num_rows);`
fix continueReadingChain from several mark ranges 2018-02-22 12:43:57 +00:00			`/// Evaluate defaulted columns if necessary.`
Fix MergeTreeRangeReader. Fix MergeTreeReader. Fix MergeTreeBaseSelectProcessor. Better exception message for TreeExecutor. Added header_without_virtual_columns to MergeTreeBaseSelectProcessor. Fix MergeTreeReverseSelectProcessor. Fix MergeTreeDataSelectExecutor. 2019-10-02 11:57:17 +00:00			`void evaluateMissingDefaults(Block additional_columns, Columns & res_columns);`
Merge 2014-12-04 15:50:48 +00:00
added MergeTreePrewhereRangeReader 2018-02-13 19:34:15 +00:00			`const NamesAndTypesList & getColumns() const { return columns; }`
Review fixes. 2019-10-31 11:32:24 +00:00			`size_t numColumnsInResult() const { return columns.size(); }`
added MergeTreePrewhereRangeReader 2018-02-13 19:34:15 +00:00
Fix first time read from mark 2019-03-25 16:55:48 +00:00			`size_t getFirstMarkToRead() const`
			`{`
			`return all_mark_ranges.back().begin;`
			`}`
Merge 2013-11-26 11:55:11 +00:00
polymorphic parts (development) 2019-11-07 11:11:38 +00:00			`MergeTreeData::DataPartPtr data_part;`

polymorphic parts (development) 2019-10-10 16:30:30 +00:00			`protected:`
polymorphic parts (development) 2019-10-11 15:37:16 +00:00
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`/// avg_value_size_hints are used to reduce the number of reallocations when creating columns of variable size.`
			`ValueSizeMap avg_value_size_hints;`
Added DeserializeBinaryBulkState which stores reading state for ColumnWithDictionary. 2018-05-21 16:21:15 +00:00			`/// Stores states for IDataType::deserializeBinaryBulk`
			`DeserializeBinaryBulkStateMap deserialize_binary_bulk_state_map;`
[WIP] CLICKHOUSE-3943: Store last_readed_mark in MergeTreeThreadBlockInputStream and don't recreate reader if it stopped in appropriate position 2018-10-03 17:10:23 +00:00			`/// Path to the directory containing the part`
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`String path;`
use common cur_mark_idx for all streams [#CLICKHOUSE-2116] 2017-01-24 20:44:12 +00:00
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`/// Columns that are read.`
Revert "Removed questionable code (2) [#CLICKHOUSE-2]." This reverts commit fcabe8b8886720da24208b12e176f0d09f408698. 2017-12-25 21:57:29 +00:00			`NamesAndTypesList columns;`
dbms: removed dead code; healing corrupted nested columns [#METR-15896]. 2015-04-09 00:37:08 +00:00
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`UncompressedCache * uncompressed_cache;`
			`MarkCache * mark_cache;`
			`/// If save_marks_in_cache is false, then, if marks are not in cache, we will load them but won't save in the cache, to avoid evicting other data.`
polymorphic parts (development) 2019-10-10 16:30:30 +00:00
polymorphic parts (development) cleanup 2019-12-18 15:54:45 +00:00			`MergeTreeReaderSettings settings;`
dbms: cut dependencies [#METR-2944]. 2015-04-16 06:12:35 +00:00
Presumably better #2278 2018-10-17 03:13:00 +00:00			`const MergeTreeData & storage;`
Changed tabs to spaces in code [#CLICKHOUSE-3]. 2017-04-01 07:20:54 +00:00			`MarkRanges all_mark_ranges;`
dbms: fixed error [#METR-15804]. 2015-04-02 03:08:43 +00:00
updated MergeTreePrewhereRangeReader; renamed MergeTreePrewhereRangeReader to MergeTreeRangeReader 2018-02-20 11:45:58 +00:00			`friend class MergeTreeRangeReader::DelayedStream;`
Merge 2013-11-26 11:55:11 +00:00			`};`

			`}`