ClickHouse/contrib/FastMemcpy
..
CMakeLists.txt
FastMemcpy_Avx.c
FastMemcpy_Avx.h
FastMemcpy.c
FastMemcpy.h
LICENSE
memcpy_wrapper.c
README.md

Internal implementation of memcpy function.

It has the following advantages over libc-supplied implementation:

  • it is linked statically, so the function is called directly, not through a PLT (procedure lookup table of shared library);
  • it is linked statically, so the function can have position-dependent code;
  • your binaries will not depend on glibc's memcpy, that forces dependency on specific symbol version like memcpy@@GLIBC_2.14 and consequently on specific version of glibc library;
  • you can include memcpy.h directly and the function has the chance to be inlined, which is beneficial for small but unknown at compile time sizes of memory regions;
  • this version of memcpy pretend to be faster (in our benchmarks, the difference is within few percents).

Currently it uses the implementation from Linwei (skywind3000@163.com). Look at https://www.zhihu.com/question/35172305 for discussion.

Drawbacks:

  • only use SSE 2, doesn't use wider (AVX, AVX 512) vector registers when available;
  • no CPU dispatching; doesn't take into account actual cache size.

Also worth to look at: