Calculate the entry point of an ELF file as a physical address (offset from 0)-CodePudding

I am building a RISC-V emulator which basically loads a whole ELF file into memory.

Up to now, I used the pre-compiled test binaries that the risc-v foundation provided which conveniently had an entry point exactly at the start of the .text section.

For example:

> riscv32-unknown-elf-objdump ../riscv32i-emulator/tests/simple -d

../riscv32i-emulator/tests/simple:     file format elf32-littleriscv


Disassembly of section .text.init:

80000000 <_start>:
80000000:       0480006f                j       80000048 <reset_vector>
...

Going into this project I didn't know much about ELF files so I just assumed that every ELF's entry point is exactly the same as the start of the .text section.

The problem arose when I compiled my own binaries, I found out that the actual entry point is not always the same as the start of the .text section, but it might be anywhere inside it, like here:

> riscv32-unknown-elf-objdump a.out -d

a.out:     file format elf32-littleriscv


Disassembly of section .text:

00010074 <register_fini>:
   10074:       00000793                li      a5,0
   10078:       00078863                beqz    a5,10088 <register_fini 0x14>
   1007c:       00010537                lui     a0,0x10
   10080:       43850513                addi    a0,a0,1080 # 10438 <__libc_fini_array>
   10084:       3a00006f                j       10424 <atexit>
   10088:       00008067                ret

0001008c <_start>:
   1008c:       00002197                auipc   gp,0x2
   10090:       cec18193                addi    gp,gp,-788 # 11d78 <__global_pointer$>
...

So, after reading more about ELF files, I found out that the actual entry point address is provided by the Entry entry on the ELF's header:

> riscv32-unknown-elf-readelf a.out -h | grep Entry
  Entry point address:               0x1008c

The problem now becomes that this address is not the actual address on the file (offset from 0) but is a virtual address, so obviously if I set the program counter of my emulator to this address, the emulator would crash.

Reading a bit more, I heard people talk about calculations regarding offsets from program headers and whatnot, but no one had a concrete answer.

My question is: what is the actual "formula" of how exactly you get the entry point address of the _start procedure as an offset from byte 0?

Just to be clear my emulator doesn't support virtual memory and the binary is the only thing that is loaded into my emulator's memory, so I have no use for the abstraction of virtual memory. I just want every memory address as physical address on disk.

CodePudding user response：

My question is: what is the actual "formula" of how exactly you get the entry point address of the _start procedure as an offset from byte 0?

First, forget about sections. Only segments matter at runtime.

Second, use readelf -Wl to look at segments. They tell you exactly which chunk of file ([.p_offset, .p_offset .p_filesz)) goes into which in-memory region ([.p_vaddr, .p_vaddr .p_memsz)).

The exact calculation of "at which offset in the file does _start reside" is:

Find Elf32_Phdr which "covers" the address contained in Elf32_Ehdr.e_entry.
Using that phdr, file offset of _start is: ehdr->e_entry - phdr->p_vaddr phdr->p_offset.

Update:

So, am I always looking for the 1st program header?

No.

Also by "covers" you mean that the 1st phdr->p_vaddr is always equal to e_entry?

No.

You are looking for a the program header (describing relationship between in-memory and on-file data) which overlaps the ehdr->e_entry in memory. That is, you are looking for the segment for which phdr->p_vaddr <= ehdr->e_entry && ehdr->e_entry < phdr->p_vaddr phdr->p_memsz. This segment is often the first, but that is in no way guaranteed. See also this answer.