Home > Net >  Why do 32-bit immediates with IMUL give warnings in NASM and errors in GCC?
Why do 32-bit immediates with IMUL give warnings in NASM and errors in GCC?

Time:10-03

I am currently investigating some strange behaviour with the imul instruction, because the official Intel manual seems to differ slightly from reality.

The first thing I noticed is that the Intel manual does not consider this example to be a correct instruction:

imul rax, 2

yet both GCC/GAS (with .intel_syntax noprefix) and NASM accept this instruction without problem. Using objdump -d showed me this:

48 6b c0 02             imul   $0x2,%rax,%rax

meaning it gets translated into a different instruction that is in fact documented in the manual.

I already find this weird and would like to know, why this even exists. The only place I could find this documented was in the NASM instruction set and, weirdly enough, in the Description of the imul instruction in the Intel Manual. The latter reads:

  • Two-operand form — With this form the destination operand (the first operand) is multiplied by the source operand (second operand). The destination operand is a general purpose register and the source operand is an immediate value, a general-purpose register, or a memory location. The intermediate product (twice the size of the input operand) is truncated and stored in the destination operand location.

That is inconsistent with the Opcode table of that same instruction.

The NASM instruction set also mentions imul reg64, sbytedword and imul reg64, imm instructions, neither of which I understand what they mean. imm would imply that 64-bit immediates could be used as well, would it not? And the meaning of sbytedword is unclear to me.

Now to the 32-bit immediates: The NASM instruction set mentions imul reg64, imm32 while both the Intel Manual and the NASM set mention imul r64, r/m64, imm32. However, normally when an immediate of a lower bitcount than the destination operand is used, the Intel Manual specifically mentions sign-extension in the description column of the Opcode table. In this case, it is not mentioned, so I wondered what would happen if I happened to use a negative 32-bit immediate (in other words, requiring all 32 bits).

This is the assembly code I tested this with:

        global  imm_test

        section .text

imm_test:
        mov     rax, rdi
        imul    rax, 0xFFDFFFFF
        ret

Then I called the imm_test function from C:

#include <stdio.h>

int imm_test(int n);

int main() {
    printf("%d\n", imm_test(1));
    return 0;
}

If that 32-bit immediate were to be sign-extended, the value I would assume would have to be printed is -2097153, which, when using NASM to assemble and GCC to compile and link, is exactly what is printed.
And yet NASM gives me this warning:

test.asm:7: warning: signed dword immediate exceeds bounds [-w number-overflow]
test.asm:7: warning: dword data exceeds bounds [-w number-overflow]

However, looking at the disassembly again, the instruction is encoded exactly the way I would expect it to be:

48 69 c0 ff ff df ff    imul   $0xffffffffffdfffff,%rax,%rax

It's a 32-bit immediate sign-extended to 64-bit.

When I change the syntax of the assembly code to GAS's .intel_syntax noprefix like so:

        .intel_syntax noprefix
        .global  imm_test

        .text

imm_test:
        mov     rax, rdi
        imul    rax, 0xFFDFFFFF
        ret

and try to assemble this with the GNU assembler, I don't just get a warning, I get an error:

test.S: Assembler messages:
test.S:8: Error: operand type mismatch for `imul

Changing the imul instructions to the properly documented imul rax, rax, 0xFFDFFFFF version does not change anything.

So I'm wondering, why is the documentation for imul so inconsistent, and why are 32-bit immediates officially supported (and also work correctly), yet they give errors or warnings?

CodePudding user response:

Asm source uses values, not immediate bit-pattern encodings

imul r64, r/m64, sign_extended_imm32 or imm8 are the only forms with 64-bit operand size available1; see Intel's manual (https://www.felixcloutier.com/x86/imul), so 0x0000_0000_FFDF_FFFF is not encodeable.

But that's what 0xFFDF_FFFF means; like with any place-value way of writing numbers, unwritten places to the left are assumed to be 0.

NASM warns about truncating, GAS simply errors with a not very helpful message, but in both cases the only problem is the numeric value of the constant. With .intel_syntax noprefix in GAS, imul rax, rax, 0x7FDFFFFF assembles just fine. A signed-positive 32-bit number is not a problem. (High bit = 0.)

mov eax, 0xFFDF_FFFF is encodeable because the operand-size is 32-bit, so the source operand is a raw 32-bit value that doesn't implicitly get sign-extended to 64-bit.

As part of executing a mov to EAX, the upper 32 bits of RAX get zeroed. You could look at it as the constant getting zero-extended to 64-bit, but that extension happens as part of a 32-bit instruction writing a 32-bit register on x86-64. add eax, 0xFFDF_FFFF is a clearer case: it's doing a 32-bit add, truncating the result to 32-bit, and writing it to EAX. Implicit zero-extension into RAX happens during that register write after the add, not while reading the inputs. It's only with mov that copies a value unchanged that there's room to look at it a different way.

Either way, assemblers understand the full value you wrote, and will tell you if it's not possible to encode that value as an operand of whatever operand-size. Remember, asm source uses values, not bit-patterns for the machine code. This is part of why you're using an assembler. If you meant 0xFFFF_FFFF_FFDF_FFFF, you should write that.


imul rax,2 being a "separate form"?

NASM (and most other assemblers including GAS) accept imul x, imm as short-hand for imul x, x, imm. Same for AVX instructions like vpand xmm0, xmm0, xmm1.

It just saves you from having to repeat the same register twice as both the destination and first source when you don't want to take advantage of the non-destructive separate destination. There isn't a different machine encoding for that form, only asm-level syntax, which is why you don't find it in Intel's manuals, and why disassembly shows the real form the assembler picked.


Footnote 1: You mentioned the NASM appendix B which shows:

IMUL             reg64,reg64,imm8         X64 
IMUL             reg64,reg64,sbytedword   X64,ND 
IMUL             reg64,reg64,imm32        X64 
IMUL             reg64,reg64,imm          X64,ND 

I don't know what the point of the ND entries are, but mov is the only instruction in x86-64 that can take a 64-bit immediate. The imm8 and imm32 forms are a complete enumeration of your options. So is sbytedword (a signed byte or dword). The plain unqualified imm is just confusing and wrong.

(NASM documents the reg64,reg64,imm8 form separately from reg64,imm8, that's just NASM letting the middle operand implicitly be the same as the first operand. The machine encoding is still imul r64, r/m64, immediate with two different opcodes, one for 8-bit and one for 32-bit immediate. Same opcodes that do 32-bit and 16-bit operand-size with no or different prefixes.)

NASM's Appendix B has been wrong before, e.g. about which CPU version each form of each instruction was new in. This fork of NASM 2.05's appendix corrected those mistakes, and is a useful reference for that. It still includes more text descriptions that later NASM versions removed when the instruction list got longer.)

But really I only ever refer to that ecm's fork of the NASM appendix when I want to check on imul r,r/m,imm being new in 186, or something like that. If I want to know the current state of the x86 ISA, what forms of an instruction are available, I check Intel's manuals. (Or actually the HTML scraped from Intel's vol.2 PDF, on https://www.felixcloutier.com/x86/). Intel sometimes has mistakes, but not about something that important / fundamental to the point of the manual.

CodePudding user response:

Why are 32-bit immediates officially supported (and also work correctly), yet they give errors or warnings?

I assume that a 64-bit GCC compiler internally works with signed 64-bit integers.

The version I am using prints an error message because the 64-bit value 0xFFDFFFFF is not in the range -0x80000000... 0x7FFFFFFF and therefore cannot be converted to a 32-bit signed value!

Note: "32-bit constants are supported" does not mean that the assembler will automatically truncate constants to 32 bits if more than 32 bits are required! And you need at least 33 bits to store the value 0xFFDFFFFF as signed number!

Writing the constant as 0xFFFFFFFFFFDFFFFFFFFF or as -0x200001 worked well.

Maybe the developers of NASM want that a number like 0xFFFFFFFE is interpreted as -2 for compatibility with older programs where -2 can be written as 0xFFFFFFFE;

... and the developers of GNU AS assume that a developer typing 0xFFFFFFFE in a 64-bit program really means 4294967294 and not -2.

... this means that GAS assumes that imul rax, rax, 0xFFDFFFFF shall have the same result as imul rax, rbx after a mov rbx, 0xFFDFFFFF.

As a consequence, GNU AS prints an error message because imul rax, rax, immediate cannot multiply rax with 4294967294.

So I'm wondering, why is the documentation for imul so inconsistent ...

This has nothing to do with documentation:

GCC (and GNU assembler) uses a different syntax than the Intel's official syntax. "nasm" is much closer to the official syntax.

... and I think that this is documented somewhere.

  • Related