Home > database >  Why is a statically-linked "hello world" program so big (over 650 KB)?
Why is a statically-linked "hello world" program so big (over 650 KB)?

Time:04-06

Consider the following program:

#include <stdio.h>
int main(void)
{
  printf("hello world\n");
  return 0;
}

If I build it with GCC, optimizing for size, and with static linking, and then strip it for further size minimization (maybe):

$ gcc -Os -static hello.c -o hello
$ strip hello

I get an executable of size ~695 KB.

Why is it so big? I realize it's not just my object code, and that there are stubs and what-not, but still, that's kind of huge.

Notes:

  • OS: Devuan GNU/Linux Chimaera (~= Debian Bullseye)
  • Compiler: GCC 10.2
  • libc: glibc 2.31-13
  • Processor architecture: x86_64
  • It doesn't improve if I build with -O3 -flto.

CodePudding user response:

A partial answer: The executable's inflated size...

  • does not have anything to do with your use of printf.
  • does not have anything to do with the compiler's ability to optimize your code.
  • does not have anything to do with your inclusion of <stdio.h>.

Why? Because even if you compile an empty program:

int main(void)
{
  return 0;
}

You still get the same 695 KB executable.

Thanks @SparKot for the comment indicating this direction.

CodePudding user response:

Fundamentally the issue here is that GNU libc isn't designed to be statically linked, which means, among other things, that the developers have not spent any time on reducing the size of statically-linked binaries.

I compiled your program with -static and also the special compiler argument -Wl,-Map,a.map which asks the linker to write out a file a.map (you can put any name you like after the second comma in that incantation) that explains why each object file was included in the link. These are the first few lines of that file, edited slightly for readability:

Archive member included to satisfy reference by file (symbol)

/usr/lib/libc.a(libc-start.o)
                              /usr/lib/crt1.o (__libc_start_main)
/usr/lib/libc.a(check_fds.o)
                              /usr/lib/libc.a(libc-start.o) (__libc_check_standard_fds)
/usr/lib/libc.a(libc-tls.o)
                              /usr/lib/libc.a(libc-start.o) (__libc_setup_tls)
/usr/lib/libc.a(errno.o)
                              /usr/lib/libc.a(check_fds.o) (__libc_errno)
/usr/lib/libc.a(assert.o)
                              /usr/lib/libc.a(libc-start.o) (__assert_fail)
/usr/lib/libc.a(dcgettext.o)
                              /usr/lib/libc.a(assert.o) (__dcgettext)

What this means is that before the linker even looked at the code to your program, while it was still processing the transitive dependencies of the function that calls main, it needed to pull in the code that prints assertion failure messages, and that code pulls in the code for dynamically loading and printing localized (translated into the user's native language) error messages. It looks like the bulk of your 600K of binary executable is that code and its dependencies, including among other things all of malloc, all of fprintf, all of iconv, the parser for gettext "message object" files, ...

  • Related