mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-12-13 09:52:38 +00:00
.. | ||
CMakeLists.txt | ||
FastMemcpy_Avx.c | ||
FastMemcpy_Avx.h | ||
FastMemcpy.c | ||
FastMemcpy.h | ||
LICENSE | ||
memcpy_wrapper.c | ||
README.md |
Internal implementation of memcpy
function.
It has the following advantages over libc
-supplied implementation:
- it is linked statically, so the function is called directly, not through a
PLT
(procedure lookup table of shared library); - it is linked statically, so the function can have position-dependent code;
- your binaries will not depend on
glibc
's memcpy, that forces dependency on specific symbol version likememcpy@@GLIBC_2.14
and consequently on specific version ofglibc
library; - you can include
memcpy.h
directly and the function has the chance to be inlined, which is beneficial for small but unknown at compile time sizes of memory regions; - this version of
memcpy
pretend to be faster (in our benchmarks, the difference is within few percents).
Currently it uses the implementation from Linwei (skywind3000@163.com). Look at https://www.zhihu.com/question/35172305 for discussion.
Drawbacks:
- only use SSE 2, doesn't use wider (AVX, AVX 512) vector registers when available;
- no CPU dispatching; doesn't take into account actual cache size.
Also worth to look at:
- simple implementation from Facebook: https://github.com/facebook/folly/blob/master/folly/memcpy.S
- implementation from Agner Fog: http://www.agner.org/optimize/
- glibc source code.