How is the lvalue problem solved for SIMD inline asm with memory output operands in a 2D array?-CodePudding

I am trying to write a function that will fill my float matrix with zeros using ymm registers.

After not a long time I wrote this function:

void fillMatrixByZeros(float matrix[N][N]){
    for (int k = 0; k < N; k  = 8){
        for (int i = 0; i < N;   i){
            asm volatile (
                "vxorps %%ymm0, %%ymm0, %%ymm0;"
                "vmovups %%ymm0, (%0)"
                : "=m"(matrix[i]   k)
                : 
                : "%ymm0", "memory"
            );
        }
    }
}

I tried to compile my whole code and I got this error:

prog.cpp: In function ‘void fillMatrixByZeros(float (*)[16])’:
prog.cpp:35:8: error: lvalue required in asm statement
   35 |       );
      |        ^
prog.cpp:35:8: error: invalid lvalue in asm output 0

I made a conclusion that matrix[i] k is a rvalue or something like, so it can't be used there.

After googling, I came up with two solutions:

First:

void fillMatrixByZeros(float matrix[N][N]){
    for (int k = 0; k < N; k  = 8){
        for (int i = 0; i < N;   i){
            asm volatile (
                "vxorps %%ymm0, %%ymm0, %%ymm0;"
                "vmovups %%ymm0, (%0)"
                : 
                : "r"(matrix[i]   k)
                : "%ymm0", "memory"
            );
        }
    }
}

Second:

void fillMatrixByZeros(float matrix[N][N]){
    long long int matrixPointer;
    for (int k = 0; k < N; k  = 8){
        for (int i = 0; i < N;   i){
            asm volatile (
                "vxorps %%ymm0, %%ymm0, %%ymm0;"
                "vmovups %%ymm0, (%0)"
                : "=r"(matrixPointer)
                : "0"(matrix[i]   k)
                : "%ymm0", "memory"
            );
        }
    }
}

These functions work correctly. And I want to know why.

Why there are no any lvalue problems in first function? And what is going on in the second function?

CodePudding user response：

You cannot assign to matrix[i] k, so it is not an lvalue. The m constraint expects an object in memory, not its address. So to fix this, supply the object you want to assign to instead of its address:

void fillMatrixByZeros(float matrix[N][N]){
    for (int k = 0; k < N; k  = 8){
        for (int i = 0; i < N;   i){
            asm volatile (
                "vxorps %%ymm0, %%ymm0, %%ymm0;"
                "vmovups %%ymm0, %0"
                : "=m"(matrix[i][k])
                : 
                : "%ymm0", "memory"
            );
        }
    }
}

This is the correct way to access objects in memory in an inline assembly statement.

The solutions using an r constraint with the address for the operand and then doing an explicit dereference work, too. But they are likely less efficient because they prevent the compiler from using some other addressing mode, like a SIB addressing mode. Instead it has to first materialise the address in a register.

Your last example is a bit silly. It uses coupled asm operands to essentially perform matrixPointer = matrix[i] k before passing that to the inline assembly statement. This is a pretty roundabout way to do it and not at all needed.

That said, for further efficiency you should hoist the clearing of ymm0 out of the loop. Something like this perhaps?

#include <immintrin.h>

#define N 1000

void fillMatrixByZeros(float matrix[N][N]){
    for (int k = 0; k < N; k  = 8){
        for (int i = 0; i < N;   i){
            asm volatile (
                "vmovups %1, %0"
                : "=m"(matrix[i][k])
                : "x"(_mm256_setzero_ps())
                : "memory"
            );
        }
    }
}

Note that just calling memset is likely to perform a lot better than hand-rolled inline assembly.