GNU inline asm: same register for different output operands allowed?-CodePudding

I've written a small function with C-code and a short inline assembly statement.
Inside the inline assembly statement I need 2 "temporary" registers to load and compare some memory values.
To allow the compiler to choose "optimal temporary registers" I would like to avoid hard-coding those temp registers (and putting them into the clobber list). Instead I decided to create 2 local variables in the surrounding C-function just for this purpose. I used "=r" to add these local variables to the output operands specification of the inline asm statement and then used them for my load/compare purposes.
These local variables are not used elsewhere in the C-function and (maybe because of this fact) the compiler decided to assign the same register to the two related output operands which makes my code unusable (comparison is always true).

Is the compiler allowed to use overlapping registers for different output operands or is this a compiler bug (I tend to rate this as a bug)?
I only found information regarding early clobbers which prevent overlapping of register for inputs and outputs... but no statement for just output operands.

A workaround is to initialize my temporary variables and to use " r" instead of "=r" for them in the output operand specification. But in this case the compiler emits initialization instructions which I would like to avoid.
Is there any clean way to let the compiler choose optimal registers that do not overlap each other just for "internal inline assembly usage"?

Thank you very much!

P.S.: I code for some "exotic" target using a "non-GNU" compiler that supports "GNU inline assembly".
P.P.S.: I also don't understand in the example below why the compiler doesn't generate code for "int eq=0;" (e.g. 'mov d2, 0'). Maybe I totally misunderstood the "=" constraint modifier?

Totally useless and stupid example below just to illustrate (focus on) the problem:

int foo(const int *s1, const int *s2)
{
    int eq = 0;
#ifdef WORKAROUND
    int t1=0, t2=1;
#else
    int t1, t2;
#endif

    __asm__ volatile(
        "ld.w  %[t1], [%[s1]]   \n\t"
        "ld.w  %[t2], [%[s2]]   \n\t"
        "jne   %[t1], %[t2], 1f \n\t"
        "mov   %[eq], 1         \n\t" 
        "1:"
        : [eq] "=d" (eq),
          [s1] " a" (s1), [s2] " a" (s2),
#ifdef WORKAROUND
          [t1] " d" (t1), [t2] " d" (t2)
#else
          [t1] "=d" (t1), [t2] "=d" (t2)
#endif
    );

    return eq;
}

In the created asm the compiler used register 'd8' for both operands 't1' and 't2':

foo:
    ; 'mov d2, 0' is missing
    ld.w  d8, [a4]  ; 'd8' allocated for 't1'
    ld.w  d8, [a5]  ; 'd8' allocated for 't2' too!
    jne   d8, d8, 1f 
    mov   d2, 1         
1:
    ret16

Compiling w/ '-DWORKAROUND':

foo:
    ; 'mov d2, 0' is missing
    mov16 d9,1
    mov16 d8,0

    ld.w  d9, [a5]   
    jne   d8, d9, 1f 
    mov   d2, 1         
1:
    ret16

EABI for this machine:

return register (non-pointer/pointer): d2, a2
non-pointer args: d4..d7
pointer args: a4..a7

CodePudding user response：

I think this is a bug in your compiler.

If it says it supports "GNU inline assembly" then one would expect it to follow GCC, whose manual is the closest thing there is to a formal specification. Now the GCC manual doesn't seem to explicitly say "output operands will not share registers with each other", but as o11c mentions, they do suggest using output operands for scratch registers, and that wouldn't work if they could share registers.

A workaround that might be more efficient than yours would be to follow your inline asm with a second dummy asm statement that "uses" both the outputs. Hopefully this will convince the compiler that they are potentially different values and therefore need separate registers:

    int t1, t2;
    __asm__ volatile(" ... code ..."
          : [t1] "=d" (t1), [t2] "=d" (t2) : ...);
    __asm__ volatile("" // no code
          : : "r" (t1), "r" (t2));

With luck this will avoid any extra code being generated for unnecessary initialization, etc.

Another possibility would be to hardcode specific scratch registers and declare them as clobbered. It leaves less flexibility for the register allocator, but depending on the surrounding code and how smart the compiler is, it may not make a lot of difference.