Home > Back-end >  How can I detect an EOF in assembly, nasm?
How can I detect an EOF in assembly, nasm?

Time:12-03

I am trying to detect an EOF character, or just any character at all, but it doesn't work, no error either.

section .data
    file db "text.txt", 0

section .bss
    char resb 1

section .text
    global _start

_start:
    mov rax, 2
    mov rdi, file
    syscall
    mov rbx, rax

    mov rdi, rbx
    mov rax, 0
    mov rsi, char
    mov rdx, 1
    syscall

    mov rcx, char
    cmp rcx, -1
    je _endOfFile

    call _end

_endOfFile:
    print 1, file, 0
    ret

_end:
    mov rax, 3
    mov rdi, rbx
    syscall

    mov rax, 60
    mov rdi, 0
    syscall

I expected it to print the name of the file, but it doesn't do anything. When I remove the cmp, and just make it jump it prints it fine. I also tried it for other characters and it didn't work for those either. I am really new to assembly, so I have no clue what to do.

CodePudding user response:

Okay, a few layers of problems here.

Most fundamental is that there is no "EOF character". Unlike ISO C's getc(), the Unix read system call doesn't signal end-of-file by reading back a particular character, it signals it by returning 0 as its return value. So you need to check the value in rax after the read syscall. If it is zero, then you have reached end-of-file. If it is 1, then you successfully read a character into the memory location char. If it is a smallish negative number, then an error occurred, and the negation of this value is an errno code.

The comparison code also has a few bugs. First of all, mov rcx, char doesn't load the character from char, it loads the address of char, which naturally does not equal -1. If you look, this is exactly similar to the mov rsi, char you used to set up the system call, which likewise put the address of char into rsi.

To specify the contents of memory at location char, you use square brackets: mov rcx, [char]. However, that wouldn't be right either. On x86-64, most instructions can operate on 8, 16, 32 or 64 bit operands. When at least one operand is a register, the size of the specified register dictates the operand size. So mov rcx, [char] would load 8 bytes, of which the lowest would be the byte from char, and the other 7 would be whatever garbage happened to follow it in memory.

To load one byte, use an 8-bit register, like cl. Then you need to likewise do the compare with only the 8-bit register, or else you're comparing against stuff that is not your character.

mov cl, [char]
cmp cl, -1
je got_ff

Though actually, in most cases, instead of mov cl, [char] it would be better to do movzx ecx, byte [char] which zeros out the upper bits of rcx. mov cl, [byte] is defined as preserving those bits, which comes with a slight performance cost.

But actually actually, you don't need to load the character into a register at all; cmp works fine with a memory operand.

cmp byte [char], -1
je it_was_ff

CodePudding user response:

To open and read a file in x86 assembly, you can use the syscall instruction to call the open and read system calls provided by the operating system. Here is an example of how you can open and read a file in x86 assembly:

section .data
    file db "text.txt", 0

section .bss
    char resb 1

section .text
    global _start

_start:
    ; Open the file
    mov rax, 2
    mov rdi, file
    syscall
    mov rbx, rax

    ; Read from the file
    mov rdi, rbx
    mov rax, 0
    mov rsi, char
    mov rdx, 1
    syscall

    ; Check for end of file
    mov rcx, char
    cmp rcx, -1
    je _endOfFile

    ; Close the file
    mov rax, 3
    mov rdi, rbx
    syscall

    ; Exit
    mov rax, 60
    mov rdi, 0
    syscall

_endOfFile:
    ; Close the file
    mov rax, 3
    mov rdi, rbx
    syscall

    ; Exit
    mov rax, 60
    mov rdi, 0
    syscall

In this code, the file variable is defined as a string containing the path to the file you want to open. The char variable is a buffer where the read data will be stored.

The _start label is the entry point of the program. In the _start routine, the open system call is called using the syscall instruction to open the file specified by the file variable. The file handle is saved in the rbx register.

Next, the read system call is called using the syscall instruction to read data from the file into the char buffer. The char buffer is then checked to see if it contains the end of file marker (-1), and if it does, the _endOfFile routine is called to close the file and exit the program.

If the char buffer does not contain the end of file marker, the _end routine is called to close the file and exit the program.

I hope this helps! Let me know if you have any other questions.

-chatgpt

  • Related