Program that reads 1 byte at a time works in the debugger, but breaks without it-CodePudding

Here is a program that takes decimal ASCII numbers one by one and converts them into an integer. The result is stored in edi register:

global _start
%macro kernel 4
    mov eax, %1
    mov ebx, %2
    mov ecx, %3
    mov edx, %4
    int 80h
%endmacro
section .bss
    symbol resb 1
section .text
_start:
    mov esi, 10
    xor edi, edi
.loop:
    kernel 3, 0, symbol, 1 ; load 1 char from STDIN into symbol
    test eax, eax          ; nothing loaded - EOF
    jz .quit
    xor ebx, ebx
    mov bl, [symbol]
    sub bl, '0'
    cmp bl, 9
    jg .quit               ; not a number
    mov eax, edi           ; previously accumulated number
    mul esi                ; eax *= 10
    lea edi, [eax   ebx]
    jmp .loop

.quit:
    mov eax, 1
    mov ebx, edi
    int 80h

I compile it:

$ nasm -g -f elf32 st3-18a.asm
$ ld -g -m elf_i386 st3-18a.o -o st3-18a
$ ./st3-18a
2[Enter]
Ctrl-d

When I run this code in gdb step by step everything is correct, and the result stored in edi at the end is 2. But when I run without a debugger, and echo the last program return value:

$ ./st3-18a
2[Enter]
Ctrl-d
$ echo $?
238

Why does it output oxEE? What is wrong?

CodePudding user response：

Your range-check is buggy, using a signed compare (jg) instead of unsigned (ja), so you only detect non-digit characters when c - '0' is from 10..127, not when it wraps around (i.e. becomes signed negative), missing almost half of the byte values you should be excluding. Including control codes like newline at the low end of the ASCII range.

double condition checking in assembly
Difference between JA and JG in assembly
NASM Assembly convert input to integer? shows a loop that does the check correctly.

So why does GDB make it work?

Your program only uses read with a size of 1, leaving a newline unread after you press return. That's ASCII 0xa = '\n', so your next read(1, buf, 1) gets it.

Unless GDB gets it first: GDB takes over the terminal to read more commands after your read(1,buf,1), so GDB gets the leftover terminal input and discards the newline before single-stepping to the next read system call. Or it might just be getting discarded when GDB switches the terminal from cooked to raw so it can read single keystrokes without waiting for it to be "submitted" from the kernel's canonical-mode line-editing with the EOL (newline) or EOF (ctrl-d) control characters.

That's because your program is sharing a terminal with GDB, rather than attaching GDB to your program already running in another terminal tab / window. i.e. on a different Unix TTY. e.g. with gdb -p $(pidof st3-18a).

You can also do that with strace, or just strace ./st3-18a since strace doesn't have interactive input.

It's common to read into a decent sized buffer and ignore later characters in toy programs that use "cooked" TTY input. That will break if you redirect input from a file so multiple lines are ready at once, so if you want something robust you can use fgets from libc.

As long as you realize that the I/O is simplistic and not robust, though, do whatever floats your boat when playing around with asm, even if that means making assumptions about lines and tty handling and that the user pressed enter instead of control-d.

Play around with cat running in a terminal, typing a partial line and hitting control-D. You can strace -p $(pidof cat) from another terminal to see its system calls.