q!Merge branch 'master' of git://github.com/yandex/ClickHouse

2024-11-27 01:51:59 +00:00 · 2016-11-12 21:04:38 +05:00 · 2016-11-12 21:04:38 +05:00 · c617565f3b
commit c617565f3b
parent b5ad7c021b b364682524
4 changed files with 8 additions and 4 deletions
--- a/dbms/src/Storages/StorageReplicatedMergeTree.cpp
+++ b/dbms/src/Storages/StorageReplicatedMergeTree.cpp
@ -2526,6 +2526,7 @@ static String getFakePartNameForDrop(const String & month_name, UInt64 left, UIn
 	DayNum_t right_date = DayNum_t(static_cast<size_t>(left_date) + lut.daysInMonth(start_time) - 1);

 	/// Уровень - right-left+1: кусок не мог образоваться в результате такого или большего количества слияний.
+	/// TODO This is not true for parts after ATTACH.
 	return ActiveDataPartSet::getPartName(left_date, right_date, left, right, right - left + 1);
 }

@ -2745,6 +2746,7 @@ void StorageReplicatedMergeTree::attachPartition(ASTPtr query, const Field & fie
 		ActiveDataPartSet::Part part;
 		ActiveDataPartSet::parsePartName(part_name, part);
 		part.left = part.right = --min_used_number;
+		part.level = 0;		/// previous level has no sense after attach.
 		String new_part_name = ActiveDataPartSet::getPartName(part.left_date, part.right_date, part.left, part.right, part.level);

 		LOG_INFO(log, "Will attach " << part_name << " as " << new_part_name);
--- a/doc/developers/architecture.md
+++ b/doc/developers/architecture.md
@ -1,13 +1,13 @@
 # ClickHouse quick architecture overview

-> Gray text is for side notes you don't have to read.
+> Optional side notes are in grey.


-ClickHouse is a true column oriented DBMS. Data is stored by columns. Even more, during query execution, data is processed by arrays (vectors, chunks of columns). In all places, where it is possible, operations on data are dispatched not for individual values but for arrays. It is called "vectorized query execution". This allows to lower dispatch cost relatively to cost of actual data processing.
+ClickHouse is a true column oriented DBMS. Data is stored by columns, and furthermore, during query execution data is processed by arrays (vectors, chunks of columns). Whenever possible, operations are dispatched not on individual values but on arrays. It is called "vectorized query execution", and it helps lower dispatch cost relative to the cost of actual data processing.

->This idea is not any new. It is dated back to `APL` programming language and its descendants: `A+`, `J`, `K`, `Q`. Array programming is widely used in scientific data processing. Also, this idea is not new for relational databases: for example, it is used in `Vectorwise` system.
+>This idea is nothing new. It dates back to the `APL` programming language and its descendants: `A+`, `J`, `K`, `Q`. Array programming is widely used in scientific data processing. Neither is this idea something new in relational databases: for example, it is used in the `Vectorwise` system.

->To speed up query processing, there are two different approaches: vectorized query execution and runtime code generation. In second approach, the code is generated for every kind of query on the fly, removing all indirection and dynamic dispatch. No one of these approaches is strictly better than the other. Runtime code generation could be better if it will fuse many operations together and could fully utilize CPU execution units and pipeline. Vectorized query execution could be worse because it must deal with temporary vectors, that must be written to cache and read back. If temporary data does not fit in L2 cache, it becomes an issue. But vectorized query execution more easily utilize SIMD capabilities of CPU. There is [research paper](http://15721.courses.cs.cmu.edu/spring2016/papers/p5-sompolski.pdf) from our friends that shows, that better to combine both approaches. ClickHouse mostly use vectorized query execution and has limited initial support for runtime code generation (only inner loop for first stage of GROUP BY could be compiled).
+>There are two different approaches for speeding up query processing: vectorized query execution and runtime code generation. In the latter, the code is generated for every kind of query on the fly, removing all indirection and dynamic dispatch. None of these approaches is strictly better than the other. Runtime code generation can be better when fuses many operations together, thus fully utilizing CPU execution units and pipeline. Vectorized query execution can be worse, because it must deal with temporary vectors that must be written to cache and read back. If the temporary data does not fit in L2 cache, this becomes an issue. But vectorized query execution more easily utilizes SIMD capabilities of CPU. A [research paper](http://15721.courses.cs.cmu.edu/spring2016/papers/p5-sompolski.pdf) written by our friends shows that it is better to combine both approaches. ClickHouse mostly uses vectorized query execution and has limited initial support for runtime code generation (only the inner loop of first stage of GROUP BY can be compiled).


 ## Columns
--- a/doc/reference_en.html
+++ b/doc/reference_en.html
@ -4515,6 +4515,7 @@ Zero as an argument is considered &quot;false,&quot; while any non-zero value is
 ===toUInt8, toUInt16, toUInt32, toUInt64===
 ===toInt8, toInt16, toInt32, toInt64===
 ===toFloat32, toFloat64===
+===toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero===
 ===toDate, toDateTime===
 ===toString===

--- a/doc/reference_ru.html
+++ b/doc/reference_ru.html
@ -4579,6 +4579,7 @@ LIMIT 10
 ===toUInt8, toUInt16, toUInt32, toUInt64===
 ===toInt8, toInt16, toInt32, toInt64===
 ===toFloat32, toFloat64===
+===toUInt8OrZero, toUInt16OrZero, toUInt32OrZero, toUInt64OrZero, toInt8OrZero, toInt16OrZero, toInt32OrZero, toInt64OrZero, toFloat32OrZero, toFloat64OrZero===
 ===toDate, toDateTime===
 ===toString===