I am building a RISC-V emulator which basically loads a whole ELF file into memory.
Up to now, I used the pre-compiled test binaries that the risc-v foundation provided which conveniently had an entry point exactly at the start of the .text
section.
For example:
> riscv32-unknown-elf-objdump ../riscv32i-emulator/tests/simple -d
../riscv32i-emulator/tests/simple: file format elf32-littleriscv
Disassembly of section .text.init:
80000000 <_start>:
80000000: 0480006f j 80000048 <reset_vector>
...
Going into this project I didn't know much about ELF files so I just assumed that every ELF's entry point is exactly the same as the start of the .text
section.
The problem arose when I compiled my own binaries, I found out that the actual entry point is not always the same as the start of the .text
section, but it might be anywhere inside it, like here:
> riscv32-unknown-elf-objdump a.out -d
a.out: file format elf32-littleriscv
Disassembly of section .text:
00010074 <register_fini>:
10074: 00000793 li a5,0
10078: 00078863 beqz a5,10088 <register_fini 0x14>
1007c: 00010537 lui a0,0x10
10080: 43850513 addi a0,a0,1080 # 10438 <__libc_fini_array>
10084: 3a00006f j 10424 <atexit>
10088: 00008067 ret
0001008c <_start>:
1008c: 00002197 auipc gp,0x2
10090: cec18193 addi gp,gp,-788 # 11d78 <__global_pointer$>
...
So, after reading more about ELF files, I found out that the actual entry point address is provided by the Entry
entry on the ELF's header:
> riscv32-unknown-elf-readelf a.out -h | grep Entry
Entry point address: 0x1008c
The problem now becomes that this address is not the actual address on the file (offset from 0) but is a virtual address, so obviously if I set the program counter of my emulator to this address, the emulator would crash.
Reading a bit more, I heard people talk about calculations regarding offsets from program headers and whatnot, but no one had a concrete answer.
My question is: what is the actual "formula" of how exactly you get the entry point address of the _start
procedure as an offset from byte 0?
Just to be clear my emulator doesn't support virtual memory and the binary is the only thing that is loaded into my emulator's memory, so I have no use for the abstraction of virtual memory. I just want every memory address as physical address on disk.
CodePudding user response:
My question is: what is the actual "formula" of how exactly you get the entry point address of the _start procedure as an offset from byte 0?
First, forget about sections. Only segments matter at runtime.
Second, use readelf -Wl
to look at segments. They tell you exactly which chunk of file ([.p_offset, .p_offset .p_filesz)
) goes into which in-memory region ([.p_vaddr, .p_vaddr .p_memsz)
).
The exact calculation of "at which offset in the file does _start
reside" is:
- Find
Elf32_Phdr
which "covers" the address contained inElf32_Ehdr.e_entry
. - Using that
phdr
, file offset of_start
is:ehdr->e_entry - phdr->p_vaddr phdr->p_offset
.
Update:
So, am I always looking for the 1st program header?
No.
Also by "covers" you mean that the 1st phdr->p_vaddr is always equal to e_entry?
No.
You are looking for a the program header (describing relationship between in-memory and on-file data) which overlaps the ehdr->e_entry
in memory. That is, you are looking for the segment for which phdr->p_vaddr <= ehdr->e_entry && ehdr->e_entry < phdr->p_vaddr phdr->p_memsz
. This segment is often the first, but that is in no way guaranteed. See also this answer.