I've been studying inline-assembly for almost 1 month now, and as another practice problem, I'm trying to add up two uint512
using inline-assembly, at first I got what I wanted, the code compiles and I was able to get the correct result with my code below. Though until this time I'm only compiling my program with optimization flags; -O2 and -O3.
But when I tried to remove -O1, -O2 or -O3 in my compilation flag it produces a compilation error error: 'asm' operand has impossible constraints
, what could be the reason for this?
I'm assuming because the optimizations is disabled, the program is actually trying to use more registers than my CPU have?. Still I'm just using 8 registers (or maybe not anymore?), is my input being stored in another 8 registers instead?
// uint512 = uint512
void uint512_add(unsigned long *sum, unsigned long *add) {
asm volatile(
"add %[adn0], %[sum0]\n\t"
"adc %[adn1], %[sum1]\n\t"
"adc %[adn2], %[sum2]\n\t"
"adc %[adn3], %[sum3]\n\t"
"adc %[adn4], %[sum4]\n\t"
"adc %[adn5], %[sum5]\n\t"
"adc %[adn6], %[sum6]\n\t"
"adc %[adn7], %[sum7]"
:
[sum0]" r"(sum[0]), [sum1]" r"(sum[1]),
[sum2]" r"(sum[2]), [sum3]" r"(sum[3]),
[sum4]" r"(sum[4]), [sum5]" r"(sum[5]),
[sum6]" r"(sum[6]), [sum7]" r"(sum[7])
:
[adn0]"m"(add[0]), [adn1]"m"(add[1]),
[adn2]"m"(add[2]), [adn3]"m"(add[3]),
[adn4]"m"(add[4]), [adn5]"m"(add[5]),
[adn6]"m"(add[6]), [adn7]"m"(add[7])
: "memory", "cc"
);
}
CodePudding user response:
If you remove some of the operands, like for just a 256-bit add, you'll notice that with optimization disabled, GCC wants to put a pointer directly to each memory operand in a separate register, instead of inventing addressing modes for each of them relative to the same base. So it runs out of registers. (See the middle part of Strange 'asm' operand has impossible constraints error for compiler output that demos this.)
You might want __attribute__((optimize("-O3")))
on this function or something so it doesn't stop the rest of your program from compiling.
Also, this doesn't need a "memory" clobber; you don't write any memory, and you only read via "m"
operands. It also doesn't need to be volatile: it has no side effects besides writing the " r"
in/output regs. Except for sum7
, technically those should be " &r"
early-clobber operands, since you write them before reading all of the input and in-out operands, but there's basically no plausible way for GCC to overlap registers between pointers and integer in/out here.
You could also let the compiler choose "mre"
instead of forcing memory source operands even if the source operand was a compile-time constant or in a register. But if that makes it generate worse asm for your actual use-case (e.g. separate mov
load into regs instead of memory source for adc
), then maybe just "me"
. (The "e"
constraint is like "i"
but only allowing constants that fit in a 32-bit signed integer, i.e. safe for use as an immediate with 64-bit operand-size for instructions other than mov
.)
BTW, with clang (but not GCC) you don't need inline asm at all: use typedef unsigned _ExtInt(512) uint512;
- see 256-bit arithmetic in Clang (extended integers)