When is memcpy faster than simple repeated assignment?-CodePudding

Assume that one wants to make a copy of an array declared as

DATA_TYPE src[N];

Is memcpy always as fast as or faster than the following code snippet, regardless of what DATA_TYPE and the number of elements of the array are?

DATA_TYPE dest[N];

for (int i=0; i<N; i  )
    dest[i] = src[i];

For a small type like char and large N we can be sure that memcpy is faster (unless the compiler replaces the loop with a call to memcpy). But what if the type is larger, like double, and/or the number of array elements is small?

This question came to my mind when copying many arrays of doubles each with 3 elements.

I didn't find an answer to my question in the answer to the other question mentioned by wohlstad in the comments. The accepted answer in that question essentially says "leave it for the compiler to decide." That's not the sort of answer I'm looking for. The fact that a compiler can optimize memory copying by choosing one alternative is not an answer. Why and when is one alternative faster? Maybe compilers know the answer, but developers, including compiler developers, don't know!

CodePudding user response：

Since memcpy is a library function, it is entirely dependent on the library implementation how efficient it actually is and no definitive answer is possible.

That said, any provided standard library is likely to be highly optimised and may even use hardware specific features such as DMA transfer. Whereas your code loop performance will vary depending on the optimisation settings, so is likely to perform much worse in unoptimised debug builds.

Another consideration is that the performance of memcpy() will be independent of data type and generally deterministic, whereas your loop performance is likely to vary depending on DATA_TYPE, or even the value of N.

Generally, I would expect memcpy() to be optimal and faster or as fast as an assignment loop, and certainly more consistent and deterministic, being independent of specific compiler settings, and even the compiler used.

In the end, the only way to tell is to measure it for your specific platform, toolchain, library and build options, and also for various data types. Ultimately since you would have to measure it for every usage combination to know if it were faster, I suggest that it is generally a waste of time, and of academic interest only - use the library - not only for performance and consistency, but also for clarity and maintainability.