The following C11 program extracts the bit representation of a float into a uint32_t in two different ways.
#include <stdint.h>
_Static_assert(sizeof(float) == sizeof(uint32_t));
uint32_t f2i_char(float f) {
uint32_t x;
char const *src = (char const *)&f;
char *dst = (char *)&x;
*dst = *src ;
*dst = *src ;
*dst = *src ;
*dst = *src ;
return x;
}
uint32_t f2i_memcpy(float f) {
uint32_t x;
memcpy(&x, &f, sizeof(x));
return x;
}
The output assembly, compiled with armgcc 10.2.1 (none eabi) is very different, even with the -Os
or -O3
optimizations applied:
I'm compiling with:
-mcpu=cortex-m4 -std=c11 -mfpu=fpv4-sp-d16 -mfloat-abi=hard
f2i_char:
sub sp, sp, #16
vstr.32 s0, [sp, #4]
ldr r3, [sp, #4]
strb r3, [sp, #12]
ubfx r2, r3, #8, #8
strb r2, [sp, #13]
ubfx r2, r3, #16, #8
ubfx r3, r3, #24, #8
strb r2, [sp, #14]
strb r3, [sp, #15]
ldr r0, [sp, #12]
add sp, sp, #16
bx lr
f2i_memcpy:
sub sp, sp, #8
vstr.32 s0, [sp, #4]
ldr r0, [sp, #4]
add sp, sp, #8
bx lr
Why isn't gcc generating the same assembly for both functions?
CodePudding user response:
Avoid manual copy of data. Use memcpy
. GCC knows this function very well and will not call it at all if not needed. Pointer punning can also break strict-aliasing rules,.
In none-eabi memcpy
will not emit any code as the return value is passed in the same register as a parameter. No action is needed.
https://godbolt.org/z/q8v39d737
#include <stdint.h>
_Static_assert(sizeof(float) == sizeof(uint32_t));
uint32_t f2i_char(float f) {
uint32_t x;
char const *src = (char const *)&f;
char *dst = (char *)&x;
*dst = *src ;
*dst = *src ;
*dst = *src ;
*dst = *src ;
return x;
}
uint32_t f2i1(float f) {
uint32_t x;
memcpy(&x, &f, sizeof(x));
return x;
}
f2i_char:
sub sp, sp, #8
ubfx r1, r0, #8, #8
ubfx r2, r0, #16, #8
ubfx r3, r0, #24, #8
strb r0, [sp, #4]
strb r1, [sp, #5]
strb r2, [sp, #6]
strb r3, [sp, #7]
ldr r0, [sp, #4]
add sp, sp, #8
bx lr
f2i1:
bx lr
EDIT:
you use -mfloat-abi=hard
which forces use of the FPU in any float related operations (even not mathematical). usually, I use softfp
which does hardware floating-point instructions and software floating-point linkage.
https://gcc.godbolt.org/z/z39qnvY1c
The output assembly, compiled with armgcc 10.2.1 (none eabi) is very different, even with the -Os or -O3 optimizations applied:
Your copy byte by byte and compiler has to follow your code. When you use memcpy
compiler understands your intention and does not copy byte by byte. Additional float point instructions are needed because you use hard
float ABI and ABI forces this operation to be done via the memory (float and int are passed via R0).
CodePudding user response:
Why is GCC emitting larger output with -Os than -O3 for this function on Cortex-M4?
Why not? Each option enables or disables specific compiler internal workings. Surely there may be and will be compiler decisions that will make -O3
result in smaller code than with -Os
.
Is there anything specific about the C11 standard or the Armv7E-M that's inhibiting gcc from emitting the smaller assembly at -Os?
No.
Is this gcc missing an optimization opportunity?
Yes, you could say that. But it may be on purpose - it could be, that the optimization that causes to generate such code is really compilation time and CPU consuming, so it's disabled. It's just that.