I need to move 128 bit value from the address [rsi - 0x80] to the dest variable below using instruction lddqu, and I am encountering the error "operand type mismatch for `lddqu'". I know there are previous questions on stackoverflow using lower operand sizes but what suffix should I use with the instruction to be able to get the value at that address in the variable.
__int128 dst = 0, src = 0;
asm volatile ("lddqu -0x80(%%rsi), %0\n\t"
: "=r" (dst)
: "r" (src));
Just to give an overview of the entire problem, this is only one instruction that is part of a larger graph algorithm that finds shortest path between two vertices. src variable is redundant and can be removed if it adds ambiguity. I am designing a hardware prefetcher (in a processor simulator) to predict future memory addresses based on the currently accessed addresses. Once I can get an address in a variable like dst, I have a technique that automatically predicts the future address and triggers the memory request for that address.
A larger version of the pattern is a sequence of loads and store, and looks like this:
lddqu xmm0,[rsi-0x80]
movdqu XMMWORD PTR [rdi-0x80],xmm0
lddqu xmm0,[rsi-0x70]
movdqu XMMWORD PTR [rdi-0x70],xmm0
lddqu xmm0,[rsi-0x60]
movdqu XMMWORD PTR [rdi-0x60],xmm0
lddqu xmm0,[rsi-0x50]
movdqu XMMWORD PTR [rdi-0x50],xmm0
Now, I am working on how to get the inline asm working with Intel syntax.
CodePudding user response:
lddqu
can only load into a vector register, not a general-purpose register. Use =x
in place of =r
for dst
's constraint.
Also, your source looks suspicious, since you're ignoring src
and just loading from an arbitrary offset of a register that you know nothing about the content of.
Look at the compiler-generated asm around your asm statement to see how the compiler gets __int128 dst
back into memory or integer registers after you force it to be in an XMM register, for example on https://godbolt.org/, especially with -O2 optimization enabled.
Using inline asm like this is probably even worse for efficiency than using SSE intrinsics like _mm_loadu_si128
- See also https://gcc.gnu.org/wiki/DontUseInlineAsm