ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-09-20 08:40:50 +00:00

History

Ivan b7ef5a699c Move FastMemcpy to contribs (#9219 ) * Get rid of non-existent vectorclass * Move FastMemcpy to contribs * Restore comments * Disable FastMemcpy on non-Linux * Fix cmake file * Don't build FastMemcpy for ARM64 * Replace FastMemcpy submodule with its contents * Fix cmake file * Move widechar_width to contrib/ * Move sumbur to contrib/ * Move consistent-hashing to contrib/ * Fix UBSan tests		2020-03-13 01:26:16 +03:00
..
CMakeLists.txt	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00
FastMemcpy_Avx.c	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00
FastMemcpy_Avx.h	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00
FastMemcpy.c	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00
FastMemcpy.h	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00
LICENSE	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00
memcpy_wrapper.c	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00
README.md	Move FastMemcpy to contribs (#9219 )	2020-03-13 01:26:16 +03:00

Internal implementation of memcpy function.

It has the following advantages over libc-supplied implementation:

it is linked statically, so the function is called directly, not through a PLT (procedure lookup table of shared library);
it is linked statically, so the function can have position-dependent code;
your binaries will not depend on glibc's memcpy, that forces dependency on specific symbol version like memcpy@@GLIBC_2.14 and consequently on specific version of glibc library;
you can include memcpy.h directly and the function has the chance to be inlined, which is beneficial for small but unknown at compile time sizes of memory regions;
this version of memcpy pretend to be faster (in our benchmarks, the difference is within few percents).

Currently it uses the implementation from Linwei (skywind3000@163.com). Look at https://www.zhihu.com/question/35172305 for discussion.

Drawbacks:

only use SSE 2, doesn't use wider (AVX, AVX 512) vector registers when available;
no CPU dispatching; doesn't take into account actual cache size.

Also worth to look at: