Home > Software engineering >  Where does GCC find printf ? My code worked without any #include
Where does GCC find printf ? My code worked without any #include

Time:04-06

I am a C beginner so I tried to hack around the stuff.

I read stdio.h and I found this line: extern int printf (const char *__restrict __format, ...);

So I wrote this code and i have no idea why it works.

code:

extern int printf (const char *__restrict __format, ...);

main()
{
    printf("Hello, World!\n");
}

output:

sh-5.1$ ./a.out 
Hello, World!
sh-5.1$ 

Where did GCC find the function printf? It also works with other compilers. I am a beginner in C and I find this very strange.

CodePudding user response:

gcc will link your program, by default, with the c library libc which implements printf:

$ ldd ./a.out
        linux-vdso.so.1 (0x00007ffd5d7d3000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdf2d307000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdf2d4f0000)

$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf' | head -1
0000000000056cf0 T printf@@GLIBC_2.2.5

If you build your program with -nolibc you have to satisfy a few symbols on your own (see Compiling without libc):

$ gcc -nolibc ./1.c 
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text 0x12): undefined reference to `__libc_csu_fini'
/usr/bin/ld: (.text 0x19): undefined reference to `__libc_csu_init'
/usr/bin/ld: (.text 0x26): undefined reference to `__libc_start_main'
/usr/bin/ld: /tmp/user/1000/ccCFGFhf.o: in function `main':
1.c:(.text 0xc): undefined reference to `puts'
collect2: error: ld returned 1 exit status

CodePudding user response:

You need to understand the difference between the compile and link phases of program compilation.

In the compilation phase you describe to the compiler the various things you intend to call that may be in this file, in other files or in libraries. This is done using function declarations.

 int woodle(char*);

for example. This is what header files are full of.

If the function is in the same file then the compiler will work out how to call it while it compiles that file. But for other functions it leaves a note in the generated code that says

please wire up the woodle function here so I can call it.

Usually called an import and there are tools you can use to look at the imports in an object file - name depends on platform and toolset

The linkers job is to find those imports and resolve them. It will look at objects files passed on the command line, at libraries included on the command line and also standard libraries that the c standard says should be available to all programs.

In your printf case the linker found printf in the c standard library that the linker includes automatically.

BTW - the linker looks for 'exports' from objects and libraries, there are tools to look at those too. The linkers job is to match each 'import' to an 'export'

CodePudding user response:

That your program compiles without a header present means that the compiler setting were lenient. You should still have got a warning though. The reason that your program links is that the C standard library, which contains the code of the function printf, is linked automatically. Almost every C program needs it because input and output, or generally interaction with peripherals, which that library handles, are the general means of generating a "side effect", an effect outside the program. The opposite is so uncommon that one must make the wish to not link with it explicit.

So why does your compiler accept a call to a function which has not been declared?

C emerged at a time when programs were much smaller and software development as an engineering discipline didn't formally exist:

Four years later [i.e., in 1978], as a still-junior faculty member, I tried to get my colleagues [...] to create an undergraduate computer-science degree. A senior mechanical engineer of forbidding mien snorted surely not: Harvard had never offered a degree in automotive science, why would we create one in computer science? I waited until I had tenure before trying again (and succeeding) in 1982. -Harry R. Lewis

That was about 10 years after Denis Ritchie had started to develop this versatile new programming language, the successor to B. The problems involved in creating and maintaining large programs back then were simply not as pressing and not as well-understood as they are, perhaps, today.

Among the many things that help us today, at least in most compiled languages, is strong typing. Every identifier we use is declared with a static type. But the importance and benefits of that were not that obvious in the 1970s, and early C permitted mixing and matching integers and pointers at will. It's all numbers, right? And a function is just a name for a jump address, right? The user will know what to put on the stack, and the function will read it off the stack — I really don't see a problem here ;-). This attitude brought us functions like printf().

After this stage-setting we are slowly getting to the point. Because a function is just a jump address, no function declaration needed to be present in order to to call one. The assumed parameters were what you presented, and the presumed return type defaulted to int, which was often correct or at least didn't hurt. And for a long time C kept this backward compatibility. I think the C99 standard forbid the use of undeclared identifiers, and the standard drafts for C11 and C21 both say:

An identifier is a primary expression, provided it has been declared as designating an object (in which case it is an lvalue) or a function (in which case it is a function designator)91

Footnote 91 says "Thus, an undeclared identifier is a violation of the syntax."

All compilers I tried compile it anyway (with a warning), perhaps because some ancient code that still gets compiled frequently depends on it.

CodePudding user response:

First, realize what the gcc program is. Technically, it is not a compiler, but a compiler driver. A compiler driver is responsible for driving the various other tools which perform compilation-related tasks. Some of the tools are found in PATH, whereas others are in internal compiler directories.

There are various ways to check what the driver is doing. I won't go into much detail about how I made the rest of this post, but briefly:

  • strace -f -e %process gcc is a Linux-specific way of showing all the programs executed (elsewhere in this answer, I assume Linux when specifying details but it doesn't matter)

  • gcc -v will dump out various information, but you have to learn what parts actually matter for whatever you are doing.

  • there exists a "specs" file that controls some of the argument-related stuff the driver does


Now for the actual data:

Here's the tree of processes that gcc might execute:

  • gcc, the "driver" (input various, output various. Some arguments are handled by the driver itself, but most are passed to the various subprocesses)
    • (these are repeated for every input file. If -pipe is passed, temporary files are omitted and processes are run in parallel; if --save-temps is passed, intermediate files are preserved):
      • cc1 -E -lang-asm, the "preprocessor" for assembly code (input .S, output .s - yes, case matters. Only relevant if you're trying to compile separate ASM files that need preprocessing)
      • cc1 -E, the "preprocessor" for C code (input .c; output .i. Only a separate process if -fno-integrated-cpp is passed, which is rare. Note that the cpp program in PATH is never called, even though it is provided by GCC - rather, it calls this. If -E is passed, the driver stops after this)
      • cc1, the "compiler" proper (input (usually) .c or (rarely) .i; output .s. If -S is passed, the driver stops after this; if -fsyntax-only is passed, this stage doesn't even complete)
      • (For other languages, replace cc1 with cc1plus, cc1d, cc1obj, f951, gnat1, etc. Note that the different drivers like g , gdc, etc. only affect what extra libraries are linked by default)
      • as, the "assembler" (input .s; output .o. This is looked up in PATH; it is shipped as part of Binutils, not GCC. If -c is passed, the driver stops here)
    • collect2, the "linker" wrapper (supposedly this has something to do with constructors, and potentially calls ld twice, but in practice I've never seen it. Just think of it as forwarding all its arguments to ld, even if you have constructors normally)
      • ld, the "linker" proper (input .o or others (assumed to be libraries); output executable or shared library. Like as, this is actually part of Binutils, not GCC, so it is looked up in PATH)

The driver has a lot of logic, so it is important that you use it. Notably, you should never call as or ld yourself, since that will omit arguments that rely on the driver's sense of "exact current platform".


Now, getting to your specific question:

Ignoring irrevelant arguments and simplifying paths, the ld call ends up looking like:

ld -o foo Scrt1.o crti.o crtbeginS.o foo.o -lgcc -lgcc_s -lc -lgcc -lgcc_s crtendS.o crtn.o

The various "crt" loose object files are a mixture of parts of GLIBC and GCC, needed to support the C runtime (note that there are others as well; which are linked depends on arguments). The gcc and gcc_s libraries are needed to run code on the platform at all; they are repeated because they rely on the c library which also relies on them.

Since -lc is passed by default (regardless of language), the printf symbol can be resolved. Notably, -lm, -lrt, -lpthread and others are not passed by default, so other symbols from differents parts of the C library will not be resolved unless you pass them manually.

All of this is completely independent of what headers are included.

  •  Tags:  
  • c
  • Related