Here is a program that takes decimal ASCII numbers one by one and converts them into an integer. The result is stored in edi register:
global _start
%macro kernel 4
mov eax, %1
mov ebx, %2
mov ecx, %3
mov edx, %4
int 80h
%endmacro
section .bss
symbol resb 1
section .text
_start:
mov esi, 10
xor edi, edi
.loop:
kernel 3, 0, symbol, 1 ; load 1 char from STDIN into symbol
test eax, eax ; nothing loaded - EOF
jz .quit
xor ebx, ebx
mov bl, [symbol]
sub bl, '0'
cmp bl, 9
jg .quit ; not a number
mov eax, edi ; previously accumulated number
mul esi ; eax *= 10
lea edi, [eax ebx]
jmp .loop
.quit:
mov eax, 1
mov ebx, edi
int 80h
I compile it:
$ nasm -g -f elf32 st3-18a.asm
$ ld -g -m elf_i386 st3-18a.o -o st3-18a
$ ./st3-18a
2[Enter]
Ctrl-d
When I run this code in gdb step by step everything is correct, and the result stored in edi at the end is 2. But when I run without a debugger, and echo the last program return value:
$ ./st3-18a
2[Enter]
Ctrl-d
$ echo $?
238
Why does it output oxEE? What is wrong?
CodePudding user response:
Your range-check is buggy, using a signed compare (jg
) instead of unsigned (ja
), so you only detect non-digit characters when c - '0'
is from 10..127, not when it wraps around (i.e. becomes signed negative), missing almost half of the byte values you should be excluding. Including control codes like newline at the low end of the ASCII range.
- double condition checking in assembly
- Difference between JA and JG in assembly
- NASM Assembly convert input to integer? shows a loop that does the check correctly.
So why does GDB make it work?
Your program only uses read
with a size of 1, leaving a newline unread after you press return. That's ASCII 0xa
= '\n'
, so your next read(1, buf, 1)
gets it.
Unless GDB gets it first: GDB takes over the terminal to read more commands after your read(1,buf,1)
, so GDB gets the leftover terminal input and discards the newline before single-stepping to the next read
system call. Or it might just be getting discarded when GDB switches the terminal from cooked to raw so it can read single keystrokes without waiting for it to be "submitted" from the kernel's canonical-mode line-editing with the EOL (newline) or EOF (ctrl-d) control characters.
That's because your program is sharing a terminal with GDB, rather than attaching GDB to your program already running in another terminal tab / window. i.e. on a different Unix TTY. e.g. with gdb -p $(pidof st3-18a)
.
You can also do that with strace
, or just strace ./st3-18a
since strace doesn't have interactive input.
It's common to read
into a decent sized buffer and ignore later characters in toy programs that use "cooked" TTY input. That will break if you redirect input from a file so multiple lines are ready at once, so if you want something robust you can use fgets
from libc.
As long as you realize that the I/O is simplistic and not robust, though, do whatever floats your boat when playing around with asm, even if that means making assumptions about lines and tty handling and that the user pressed enter instead of control-d.
Play around with cat
running in a terminal, typing a partial line and hitting control-D. You can strace -p $(pidof cat)
from another terminal to see its system calls.
See also How do I ignore line breaks in input using NASM Assembly?