clickhouse-benchmark <<< 'select count() from s where not ignore(s);'
before this patch:
```
QPS: 0.732, RPS: 2346562049.608, MiB/s: 22378.560, result RPS: 0.732, result MiB/s: 0.000.
0.000% 1.310 sec.
10.000% 1.321 sec.
20.000% 1.327 sec.
30.000% 1.337 sec.
40.000% 1.343 sec.
50.000% 1.359 sec.
60.000% 1.366 sec.
70.000% 1.381 sec.
80.000% 1.400 sec.
90.000% 1.434 sec.
95.000% 1.448 sec.
99.000% 1.489 sec.
99.900% 1.499 sec.
99.990% 1.500 sec.
```
after this patch:
```
QPS: 0.787, RPS: 2524560389.064, MiB/s: 24076.084, result RPS: 0.787, result MiB/s: 0.000.
0.000% 1.228 sec.
10.000% 1.232 sec.
20.000% 1.235 sec.
30.000% 1.241 sec.
40.000% 1.246 sec.
50.000% 1.256 sec.
60.000% 1.265 sec.
70.000% 1.278 sec.
80.000% 1.296 sec.
90.000% 1.321 sec.
95.000% 1.354 sec.
99.000% 1.421 sec.
99.900% 1.453 sec.
99.990% 1.456 sec.
```
I also tried a SSE2 implementation and it's much slower (50%)
In the current version of Poco, unsigned long no longer aliases to
UInt64 with LP64. The size_t aliases to unsigned long with clang,
so all the uses of size_t instead of UInt64 when interacting with
Poco interfaces are gone. I replaced uses with UInt64 where it makes
sense, and added an overloaded function for readVarUInt() to support size_t.