Is it possible for a program to read itself?-CodePudding

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.

Would it be something like

jmp labelx

And then would i then use the Write syscall , making sure to read from the next instruction from labelx:

mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall

to then output to stdout.

However how would I obtain the size to read itself? Especially if there is a label after the code i want to read or code after. Would I have to manually count the lines?

mov rdx,rip (bytes*lines)

And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.

Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?

I'm using NASM syntax btw. And pretty new to assembly. And just a question.

CodePudding user response：

Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.

You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.

See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)

To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.

here:
    lea rsi, [rel here]     ; with DEFAULT REL you could just use [here]
    mov edi, 1              ; stdout fileno
    mov edx, .end - here    ; assemble-time constant size calculation
    mov eax, 1              ; __NR_write
    syscall
.end:

This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)

The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.

I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).

Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)