ClickHouse/contrib/FastMemcpy
Ivan b7ef5a699c
Move FastMemcpy to contribs (#9219)
* Get rid of non-existent vectorclass
* Move FastMemcpy to contribs
* Restore comments
* Disable FastMemcpy on non-Linux
* Fix cmake file
* Don't build FastMemcpy for ARM64
* Replace FastMemcpy submodule with its contents
* Fix cmake file
* Move widechar_width to contrib/
* Move sumbur to contrib/
* Move consistent-hashing to contrib/
* Fix UBSan tests
2020-03-13 01:26:16 +03:00
..
CMakeLists.txt Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00
FastMemcpy_Avx.c Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00
FastMemcpy_Avx.h Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00
FastMemcpy.c Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00
FastMemcpy.h Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00
LICENSE Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00
memcpy_wrapper.c Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00
README.md Move FastMemcpy to contribs (#9219) 2020-03-13 01:26:16 +03:00

Internal implementation of memcpy function.

It has the following advantages over libc-supplied implementation:

  • it is linked statically, so the function is called directly, not through a PLT (procedure lookup table of shared library);
  • it is linked statically, so the function can have position-dependent code;
  • your binaries will not depend on glibc's memcpy, that forces dependency on specific symbol version like memcpy@@GLIBC_2.14 and consequently on specific version of glibc library;
  • you can include memcpy.h directly and the function has the chance to be inlined, which is beneficial for small but unknown at compile time sizes of memory regions;
  • this version of memcpy pretend to be faster (in our benchmarks, the difference is within few percents).

Currently it uses the implementation from Linwei (skywind3000@163.com). Look at https://www.zhihu.com/question/35172305 for discussion.

Drawbacks:

  • only use SSE 2, doesn't use wider (AVX, AVX 512) vector registers when available;
  • no CPU dispatching; doesn't take into account actual cache size.

Also worth to look at: