aarch64-gcc simd inline asm, result always 0-CodePudding

I try to do SIMD multiplication with inline assembler. However, the result is always zero or (in other cases) gets ununderstandable (for me) values.

#include <stdio.h>

int main(void)
{
        double x[2] = {2.0, 3.0};
        double y[2] = {0.0, 0.0};

        asm volatile (
              "fmul %[y].2d, %[x].2d, %[x].2d\n"
        : /* outputs */
          [y] "=&w" (y)
        : /* inputs */
          [x] "w" (x)
        : /* clobbers */
          "cc"
        );

        printf("result = (%f, %f)\n",
               y[0], y[1]);

        return 0;
}

Compiled with

aarch64-linux-gnu-gcc -mcpu=cortex-a73 -march='armv8-a'

I always get the output

result = (0.000000, 0.000000)

but I would expect (4.0, 9.0). Please help!

CodePudding user response：

As Jester said, you have to pass a value to the asm statement, not a pointer to the datum in question. The correct type for this value is float64x2_t from arm_neon.h. So proceed as follows:

#include <stdio.h>
#include <arm_neon.h>

int main(void)
{
        double x[2] = {2.0, 3.0};
        double y[2] = {0.0, 0.0};

        asm volatile (
              "fmul %[y].2d, %[x].2d, %[x].2d\n"
        : /* outputs */
          [y] "=&w" (*(float64x2_t *)y)
        : /* inputs */
          [x] "w" (*(float64x2_t *)x)
        : /* clobbers */
          "cc"
        );

        printf("result = (%f, %f)\n",
               y[0], y[1]);

        return 0;
}

Note that when you include the intrinsics header, you might as well just use intrinsics directly:

int bar(void)
{
        double x[2] = {2.0, 3.0};
        double y[2] = {0.0, 0.0};
        float64x2_t *xx = x, *yy = y;

        *yy = vmulq_f64(*xx, *xx);

        printf("result = (%f, %f)\n",
               y[0], y[1]);

        return 0;
}