Home > Net >  Why does subtracting the value of etext from edata not give me the correct size for my text segment
Why does subtracting the value of etext from edata not give me the correct size for my text segment

Time:09-20

I have a C source code file like this -

#include <stdio.h>

extern char etext, edata, end;

int main(int argc, char* argv[], char* envp[]) {

    printf("TEXT END: %p\n", &etext);
    printf("DATA END: %p\n", &edata);
    printf("PROG BRK: %p\n", &end);

    register long volatile sp asm("rsp");
    printf("STCK PTR: %p\n", sp);

    return 0;
}

Which gives me this output -

TEXT END: 0x5565d48661d5
DATA END: 0x5565d4869018
PROG BRK: 0x5565d4869020
STCK PTR: 0x7fff8f6f34d0

To my understanding, the text, data, and bss segments are laid out sequentially in memory with no space between them. But (&edata - &etext) is 0x2e43 while size -x a.out says the size of the data section is 0x250. Why does this happen?

I am using gcc 12.2.0 on linux x86_64 5.19.9.

CodePudding user response:

This program layout test used to work for unix programs in the old days. It still does to some extend on selected systems, such as linux, but the etext, edata and end should be declared as arrays as they are addresses, not variables:

extern char etext[], edata[], end[];

Here is a modified program:

#include <stdio.h>

extern char etext[], edata[], end[];

int main(int argc, char *argv[], char *envp[]) {

    printf("main    : %p\n", (void *)main);
    printf("TEXT END: %p\n", (void *)etext);
    printf("DATA END: %p\n", (void *)edata);
    printf("PROG BRK: %p\n", (void *)end);

    void *sp = &argc;
    printf("STCK PTR: %p\n", sp);

    return 0;
}

Output on my linux system:

main    : 0x4004b0
TEXT END: 0x40068d
DATA END: 0x600a40
PROG BRK: 0x600a48
STCK PTR: 0x7ffc52b616bc

As you can see, the text and data segments are not contiguous on this system, so the size of the text segment. edata - etext used to evaluate to the size of the data segment, but this no longer works.

Also note that successive runs of the program may output different addresses on modern systems because of address space randomization, a technique that makes it more difficult for hackers to exploit software flaws.

CodePudding user response:

To my understanding, the text, data, and bss segments are laid out sequentially in memory with no space between them.

That used to be true a long time ago. Nowadays it usually isn't the case, for several different reasons.

Also, back when it was true, edata - etext would give you the size of the data segment, not the text segment.

  • Related