Commit Graph

29 Commits

Author SHA1 Message Date
Robert Schulze
421afeeae0
Shuffle class order (and just that) 2023-02-24 10:13:35 +00:00
Robert Schulze
ef529de7db
Cosmetics 2023-02-24 10:12:47 +00:00
Jiebin Sun
d6710d9b34 Align all the SSE4.1 requirement and use needle_size
Align all the SSE4.1 requirement from StringSearcher. Use needle_size
in while loop to make the code clean.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2023-02-22 16:15:26 -05:00
Jiebin Sun
1f62135ba7 Make the optimized SIMD StringSearcher clean
This patch has revised the name of value and added comments to make
the SIMD StringSearcher clean and easy to understand based on pull
request 46289.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2023-02-22 12:18:21 -05:00
Jiebin Sun
d220e7f4fc Optimize the SIMD StringSearcher if needle_size is large
This patch offers an additional optimization when the needle_size is
large. If the needle_size is larger than the haystack_size, there is
no need to search any more.

The optimized SIMD StringSearcher has led at most 41.7% than Volnitsky
algorithm when the needle_size is less than 21, and fallen behind only
about 1% even when the needle_size is bigger than 50, which is not
considered as a common case.

Test platform: ICX server
Test query: SELECT COUNT(*) FROM hits WHERE URL LIKE '%{Needle}%';

Needle_size	opt/baseline
5		141.7%
6		129.4%
8		118.5%
9		112.3%
10		107.4%
14		103.4%
20		100.2%
21		100.7%
51		99.0%

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2023-02-22 11:58:17 -05:00
Jiebin Sun
f5a6a86dec Optimize the SIMD StringSearcher by searching first two chars
This patch offers the optimized SIMD StringSearcher by searching the first
and second chars together rather than only the first char, which will result
in big performance gain. The patch also provides a quick path when the needle
size is 1.

With this patch, I have tested the 43 queries in clickbench on ICX server.
Query 20 has got 35% performance gain. Other StringSearcher related queries
have got around 10% performance improvement. And the overall geomean of all
the queries has got 4.1% performance gain.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2023-02-22 11:55:30 -05:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages (#45449)
* save format string for NetException

* format exceptions

* format exceptions 2

* format exceptions 3

* format exceptions 4

* format exceptions 5

* format exceptions 6

* fix

* format exceptions 7

* format exceptions 8

* Update MergeTreeIndexGin.cpp

* Update AggregateFunctionMap.cpp

* Update AggregateFunctionMap.cpp

* fix
2023-01-24 00:13:58 +03:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Robert Schulze
1e3b5bfcb7
Fix test 00233_position_function_family 2022-06-30 11:43:25 +00:00
Robert Schulze
81bb2242fd
Fix countSubstrings() & position() on patterns with 0-bytes
SQL functions countSubstrings(), countSubstringsCaseInsensitive(),
countSubstringsUTF8(), position(), positionCaseInsensitive(),
positionUTF8() with non-const pattern argument use fallback sorters
LibCASCIICaseSensitiveStringSearcher and LibCASCIICaseInsensitiveStringSearcher
which call ::strstr(), resp. ::strcasestr(). These functions assume that
the haystack is 0-terminated and they even document that. However, the
callers did not check if the haystack contains 0-byte (perhaps because
its sort of expensive). As a consequence, if the haystack contained a
zero byte in it's payload, matches behind this zero byte were ignored.

    create table t (id UInt32, pattern String) engine = MergeTree() order by id;
    insert into t values (1, 'x');
    select countSubstrings('aaaxxxaa\0xxx', pattern) from t;

We returned 3 before this commit, now we return 6
2022-06-29 21:41:18 +00:00
Robert Schulze
c22038d48b
More clang-tidy fixes 2022-06-28 11:50:05 +00:00
Robert Schulze
b56c28d841
Replace a few uses of enable_if for SFINAE by concepts
- enable_if is usually regarded as fragile and unreadable

- C++20 concepts are much easier to read and produce more expressive
  error messages
2022-03-16 19:51:38 +01:00
Maksim Kita
e7772ed434 Fix clang-tidy warnings in Common folder 2022-03-14 18:17:35 +00:00
vdimir
b474dc87ac
check len of char with upper and lower case in putNGramUTF8CaseInsensitive 2022-02-17 12:39:29 +00:00
alexey-milovidov
0a112bcf61
Update StringSearcher.h 2022-01-26 13:45:26 +03:00
HarryLeeIBM
8b24688afb Issue 7334: Fixed utf8 string case-insensitive searching issue 2022-01-25 13:56:05 -05:00
Alexander Tokmakov
a1cab43feb fix five years old bug in StingSearcher 2021-10-26 13:32:07 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Maksim Kita
67e9b85951 Merge ext into common 2021-06-16 23:28:41 +03:00
Alexey Milovidov
0fa5142715 Remove tons of garbage 2021-01-31 05:36:52 +03:00
Alexey Milovidov
1e2669fd3c Fix error 2021-01-29 07:54:46 +03:00
Alexey Milovidov
355c99568e Fix error 2021-01-28 10:16:36 +03:00
Alexey Milovidov
95e15131a8 Fix unsufficient args check (trash code) in StringSearcher 2021-01-27 20:32:59 +03:00
Maksim Kita
685099af7f Move getPageSize in common 2020-12-17 00:23:41 +03:00
Maksim Kita
dbb2fbcdd5 Unified usages of getPageSize
1. Introduced getPageSize function.
2. Replaced usages of getpagesize with getPageSize function.
3. Replaced usages of sysconf(_SG_PAGESIZE) with getPageSize function.
2020-12-16 13:42:23 +03:00
Amos Bird
3817c0efa7
Remove redundant conditions 2020-09-04 02:13:57 +08:00
Alexey Milovidov
293ae88e7f Add sampling memory profiler 2020-04-30 16:25:17 +03:00
Alexey Milovidov
a7d7dc5034 Fix some bad code 2020-04-26 20:34:36 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00