* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix
In lowerUTF8()/upperUTF8() there is an SSE optimization that handles
16 byte at a time, but only for ASCII, for UTF8 symbols converion will
be done by symbol.
Consider the following example:
КВ АМ И СЖ
^ - offset is 15, length of sequence is 2
so first byte of a symbol is in first 16 bytes
second byte of a symbol is not ther
And in this case it will be handled incorrectly because it will try to
process oly these 16 bytes w/o looking forward.
This had been broken by #41286, before this patch it does not looks at
the row boundaries but only at the string end and so this sutation
wasn't possible.
Fixes: #42756
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
AFAICS it was there before since it was possible to overrun the
expected_end, since utf8.convert() was called with "src_end - src" not
"expected_end - src".
Refs: 5a21f3908b054a0efc90c65a12fbe151c74d90dc:dbms/include/DB/Functions/FunctionsString.h
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Right now lowerUTF8() and upperUTF8() does not respect row boundaries,
and so one row may break another.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>