I try to do SIMD multiplication with inline assembler. However, the result is always zero or (in other cases) gets ununderstandable (for me) values.
#include <stdio.h>
int main(void)
{
double x[2] = {2.0, 3.0};
double y[2] = {0.0, 0.0};
asm volatile (
"fmul %[y].2d, %[x].2d, %[x].2d\n"
: /* outputs */
[y] "=&w" (y)
: /* inputs */
[x] "w" (x)
: /* clobbers */
"cc"
);
printf("result = (%f, %f)\n",
y[0], y[1]);
return 0;
}
Compiled with
aarch64-linux-gnu-gcc -mcpu=cortex-a73 -march='armv8-a'
I always get the output
result = (0.000000, 0.000000)
but I would expect (4.0, 9.0). Please help!
CodePudding user response:
As Jester said, you have to pass a value to the asm
statement, not a pointer to the datum in question. The correct type for this value is float64x2_t
from arm_neon.h
. So proceed as follows:
#include <stdio.h>
#include <arm_neon.h>
int main(void)
{
double x[2] = {2.0, 3.0};
double y[2] = {0.0, 0.0};
asm volatile (
"fmul %[y].2d, %[x].2d, %[x].2d\n"
: /* outputs */
[y] "=&w" (*(float64x2_t *)y)
: /* inputs */
[x] "w" (*(float64x2_t *)x)
: /* clobbers */
"cc"
);
printf("result = (%f, %f)\n",
y[0], y[1]);
return 0;
}
Note that when you include the intrinsics header, you might as well just use intrinsics directly:
int bar(void)
{
double x[2] = {2.0, 3.0};
double y[2] = {0.0, 0.0};
float64x2_t *xx = x, *yy = y;
*yy = vmulq_f64(*xx, *xx);
printf("result = (%f, %f)\n",
y[0], y[1]);
return 0;
}