What is limiting the size of variables / registers in assembly? (32-bit Linux)-CodePudding

EDIT: okay, there's the code:

global _start


section .text
_start:
    ; presetting variables
    mov [rev_ans], byte 0
    mov [multiplier], byte 1
    
    ; square of num
    mov eax,31
    mov ebx,eax
    mul ebx
    mov [left_num],eax
    
    ; know the digits count of the answer
    mov [digit_count_left_num],eax
    call digit_num
    
    ; convert into ascii digits and store one by one reversed
    call reverse_ans
    
    ; print
    mov eax, [rev_ans]
    mov [left_num], eax
    call print_ans

    ; exit
    mov eax,1
    mov ebx,0
    int 80h
    
    
    
reverse_ans:
    ; restoring eax data
    mov eax,[left_num]
    mov edx,0
    mov ebx,10
    div ebx
    
    ; moving left number (undisplayed yet) into left_num
    mov [left_num],eax
    
    ; building reversed answer integer
    mov eax,[multiplier]
    mov ecx,edx
    mov edx,0
    mul ecx
    add [rev_ans],eax
    
    ; incrementing the multiplier
    mov eax,[multiplier]
    mov ebx,10
    mov edx,0
    div ebx
    mov [multiplier],eax    
    
    ; repeating process for other digits
    cmp [left_num], byte 0
    jg reverse_ans
    
    ret
    
    
    
    
print_ans:
    ; restoring eax data
    mov eax,[left_num]
    mov edx,0
    mov ebx,10
    div ebx
    mov [ans],edx
    
    ; converting into ascii digit
    add [ans], byte 0x30
    
    ; moving the number which is left (undisplayed yet) into left_num
    mov [left_num],eax
    
    ; print
    mov eax,4
    mov ebx,1
    mov ecx,ans
    mov edx,1
    int 80h
    
    ; repeating process for other digits
    cmp [left_num], byte 0
    jg print_ans
    
    ret


digit_num:
    ; increment multiplier
    mov eax, [multiplier]
    mov ebx,10
    mov edx,0
    mul ebx
    mov [multiplier],eax
    
    ; check
    mov eax,[digit_count_left_num]
    mov ebx,[multiplier]
    mov edx,0
    div ebx
    ; updating the number which was left after division
    mov [digit_count_left_num],eax
    
    cmp eax, byte 0
    jg digit_num
    ret




section .bss
    ans resb 256
    digit resb 256
    left_num resb 256
    digit_count_left_num resb 256
    rev_ans resb 256
    multiplier resb 256

My program takes a hardcoded decimal number, squares it, and prints the answer on the screen. Everything works just fine if I give it a number, whose square is less than 1000. For example: I give it 2, it says 4;I give 31, it says 961. When I give it 32, it just prints 1 (where the answer should be 1024). When I input 65, the answer should be 4225, but appears as 0.
Also, I noticed a strange behavior: inputting 20 (and expecting 400) I get only 4. Giving 10 also results in just 1 instead of 100. Giving 30 outputs 9.

A couple of things to mention:

While swapping the data between variables, at quite a few times I used general purpose registers (which are 32 bits in size) to temporarily store those variables' data. I thought that this could be an issue, but then I remembered that the maximum integer these registers can hold is over 4 billion! And my numbers are going just over a thousand!
All my variables in this program were reserved and not pre-initiated. For each variable I reserved 256 bytes in memory.

CodePudding user response：

All my variables in this program were reserved and not pre-initiated.

You said it: The only setup you do for the multiplier variable is mov [multiplier], byte 1. This will define 1 byte, but your program then uses mov eax, [multiplier] that will load 4 bytes. There's no absolute guarantee that the high 3 bytes would be zero.

There's also no reason to reserve 256 bytes per variable! All of your calculations involve processing dword values, so logically you would define your variables as dwords.

You seem to have devised an extremely convoluted way of printing a number, involving lots of multiplications and divisions!
For the digit_num procedure, your comment mentions "know the digits count of the answer", but I don't see you actually counting digits. All you do is establishing an elevated value for the multiplier variable. For some inputs it will work, for some other inputs it won't...

Instead of first reversing the number and then printing the reversed version, you should temporarily store the digits in a buffer (starting from the rear of the buffer) and then printing the buffer in one go. A dword value can at most have 10 digits, so a 10-byte buffer will do:

section .text
_start:
    ; square of number
    mov  eax, 31                ; {2, 10, 20, 30, 31, 32, 65}
    imul eax, eax
    mov  [number], eax

    ; convert into ASCII digits and print
    call print_answer

    ; exit
    xor  ebx, ebx
    mov  eax, 1
    int  80h
; ---------------------------
print_answer:
    mov  ecx, answer   10    ; Address above the 10-byte buffer
    mov  ebx, 10             ; CONST
    mov  eax, [number]
.more:
    xor  edx, edx            ; Dividing EDX:EAX by 10
    div  ebx
    dec  ecx                 ; Go to next lower address
    add  edx, '0'            ; Convert remainder (EDX) to ASCII
    mov  [ecx], dl           ;  and store at next lower address
    test eax, eax            ; Test quotient (EAX)
    jnz  .more               ; More to do

    ; print
    mov  edx, answer   10    ; Address above the 10-byte buffer
    sub  edx, ecx            ;  minus address of the 1st digit gives NumberOfDigits
    mov  ebx, 1
    mov  eax, 4
    int  80h
    ret
; ---------------------------
section .bss
    number resd 1
    answer resb 10

CodePudding user response：

I'm thinking there's a misunderstanding here with how variables in assembly work.

section .bss
    ans resb 256
    digit resb 256
    left_num resb 256
    digit_count_left_num resb 256
    rev_ans resb 256
    multiplier resb 256

You might think that this is the equivalent of the following C code:

char ans[256];
char digit[256];
char left_num[256];
char digit_count_left_num[256];
char rev_ans[256];
char multiplier[256];

And it is... sort of. Unlike C, assembly has no type safety whatsoever. Any time you access memory in assembly you're typecasting it to the register size you specify in your instruction. So taking your question literally, "what is limiting the size of variables/registers in assembly", your registers "limit" the size of your variables. (Air quotes because you can use multiple registers to make up for the lack of hardware support for 64-bit integers on 32-bit hardware.)

The important takeaway here is that the CPU has no clue what type your data is intended to be. C and other such languages are just very good at creating the illusion that the CPU is aware of what your data structures actually are. The labels ans, digit, etc. are just for your convenience. They represent a memory address in the .bss section of your program. For example, if your .bss section happened to be located at address 0x40000000, the following assembly code

    mov [rev_ans], byte 0
    mov [multiplier], byte 1

gets translated by the assembler into

    mov [0x40000400], byte 0
    mov [0x40000500], byte 1

Now if you were to do a hexdump of your RAM at this point, this is what you would see:

0x40000500: 01 (This is the byte 1 you wrote to multiplier)
0x40000501: 00 (Linux wrote a 0 here for you but you didn't initialize this value)
0x40000502: 00 (Linux wrote a 0 here for you but you didn't initialize this value)
0x40000503: 00 (Linux wrote a 0 here for you but you didn't initialize this value)
etc.

From what it seems, you're intending the data to be an array of 8-bit bytes, I'm guessing this based on how you defined it. But the CPU has no knowledge of this at runtime! By using mov eax,[multiplier], you're telling the CPU that your intended data type is 32-bit. Here's what's happening (and I wish that I could change text color to illustrate this better):

0x40000500: 01 (This is loaded into the lowest 8 bits of EAX, aka AL)
0x40000501: 00 (This is loaded into lower middle 8 bits of EAX, aka AH)
0x40000502: 00 (This is loaded into the upper middle 8 bits of EAX)
0x40000503: 00 (This is loaded into the highest 8 bits of EAX)

If you had values other than zero in those memory slots, you can imagine that you'd get a weird number in eax.

Long story short, if your goal is to read 8 bits at a time, use mov al, [memory] or mov ah, [memory] rather than mov eax, [memory].