I am trying to implement my own binary loader for learning purposes, but cannot figure out the data segment.
section .data
helloworld db "hello world", 10
section .text
global _start
test: ;just for testing
ret
_start:
call test
mov rax, 1
mov rbx, 1
mov rcx, helloworld
mov rdx, 11
syscall
mov rax, 60
mov rdi, 0
syscall
This is my assembly program that I am trying to run. I compiled with nasm -f elf64 test.s -o test.o && ld test.o -o test.bin
My loader looks like this:
int main(int argc, char** argv) {
char* bin = argv[1];
struct ElfLib lib = read_elf(bin); //just reading the elf library into the default structures (Elf64_Ehdr, Elf64_Phdr, etc...)
unsigned char* exec = mmap(NULL, DEFAULT_MEM_SIZ, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); //allocating the virtual memory
memset(exec, 0, DEFAULT_MEM_SIZ);
for (int i = 0; i < lib.elf_header.e_phnum; i ) {
Elf64_Phdr phdr = lib.program_headers[i];
fseek(lib.execfile, phdr.p_offset, SEEK_SET);
switch (phdr.p_type) {
case PT_LOAD: {
//load the memory at the file offset into the virtual address of exec
fread(exec phdr.p_vaddr, sizeof(unsigned char), phdr.p_memsz, lib.execfile);
break;
}
}
int flags = PROT_NONE;
#define HASFLAG(flag) if (phdr.p_flags & flag) flags|=flag
HASFLAG(PROT_EXEC); //execute flag on
HASFLAG(PROT_WRITE); //write flag on
HASFLAG(PROT_READ); //read flag on
mprotect(exec phdr.p_vaddr, phdr.p_memsz, flags);
}
void (*ex)() = (void*)(exec lib.elf_header.e_entry);
ex(); //call the _start function in the virtual memory
}
But when I run it, nothing gets printed.
I tried running it under GDB, and the program promptly exits after the exit syscall, with mov rax, 60
and mov rdi, 0
, so I know the system call part works. I think that the issue is in the address of helloworld
in the hello world program. GDB says that it is still under address 0x402000, which probably is not the same address under the virtual memory. Surprisingly, the test function is at 0x401000 with objdump, but at a completely different one when running with GDB, which does get called. Does anyone have an idea on how to go about implementing this?
I'm not sure how much this will help, but I'm running using x64 Linux under intel.
CodePudding user response:
nasm -f elf64 test.s -o test.o ld test.o -o test.bin
Unfortunately, I don't have NASM, but if I use GNU assembler instead of NASM, the lines above result in a position-dependent file.
This means that phdr.p_vaddr
does not specify a value that is relative to the variable exec
, but phdr.p_vaddr
specifies an absolute address that must not be changed.
Assuming the symbol helloworld
is located at the start of the data segment, the instruction mov rcx, helloworld
will simply load the value phdr.p_vaddr
into the register rcx
- and not the value exec phdr.p_vaddr
.
However, because the address phdr.p_vaddr
may already be used, you cannot simply load your code there!
The only possibility that you have if you want to load code from an already running program is so-called "position independent code" that can be loaded at different addresses in memory...
By the way:
64-bit x86 Linux does not take the parameters in rbx
, rcx
and rdx
, but in rdi
, rsi
and rdx
.