Home > Blockchain >  Unclear on linking vs compilation
Unclear on linking vs compilation

Time:11-05

I am aware that many questions exist that address the same issue, but I have been unable to find one that answers my question. I understand the big picture difference between compilation and linking, the former translates each source file into machine code in the form of object files and the latter "links" these object files together in an executable.

However, my confusion arises once we throw in preprocessing into this mix. It is my understanding that if we import another library then all of this library is essentially copy and pasted into our code. So even if we are working with multiple files and call one in another, aren't the contents still dumped into the other? If this is the case (which I am sure is incorrect and I have a misunderstanding somewhere) why is the linking even necessary? Aren't we already working with one giant aggregated source file?

CodePudding user response:

Preprocessing happens before compilation. The preprocessor takes a one or more source files, and outputs another source file, which is then compiled. The preprocessor is a text-to-text transformer, it has nothing to do with linking.

It is conceptually possible to dump everything in one source file using a preprocessor, and then compile it directly to an executable, skipping the stages of producing object files and linking them together. However this would be extremely inconvenient in practice. Imagine a 100,000,000 lines of code program (this includes all the standard library and all the platform libraries and all the third-party libraries). You need to change one line. Would you be willing to compile all 100,000,000 lines again? and when you make an error in that one line, do it again (and again and again and again and again)?

Some libraries are distributed entirely as header files. They do not need any binary files, and are compiled with your program every time the program is compiled. But not all libraries are like that. Some are to big to be compiled every time. Some are not written in C or C (they require bits of assembly language for example, or perhaps Fortran). Some cannot be distributed as source because the vendors are unwilling to do so for copyright reasons. In all these cases, the solution is to compile the libraries to object files, and then distribute these object files together with headers that contain just iterfaces (declarations with no definitions) of functions and variables they expose.

<iostream> that you mention is a mixed bag. In most implementations it contains both function definitions (templates and small inline functions) that you compile every time when your program is compiled, and declarations of external functions, whose definitions are compiled by the vendor and distributed as a precompiled library.

CodePudding user response:

It is my understanding that if we import another library then all of this library is essentially copy and pasted into our code.

This is not generally correct.

In C and C , #include directives are mostly used only to import header files, not all of the library. Header files are mostly used to declare functions, and sometimes objects, without defining them. In common language, the declarations in a header file describe the functions but do not define them.

For example, this is a declaration:

double square(double x);

That says “square is a function that takes a double argument and returns a double value,” but it does not contain the code for square and does not tell us what square does. (Also, the parameter name, x, can be omitted in a declaration.) A declaration just tells us what we need to call the function: We have to put a double value in the appropriate place to pass an argument, then we call the routine, and we expect a double value to be returned. The compiler cannot generate assembly code for the function from this declaration.

A definition contains the code for a function:

double square(double x)
{
    return x*x;
}

In most libraries, the definitions are in source files that are compiled separately. The results of those compilations may be made available in various forms of “library” files, such as .a, .so, .dylib, or other files. The library also provides header files that contain only the declarations and not the definitions.

Programs that use the library use #include to include the header files. That gives the compiler the information it needs to call the library routines, but it does not import all of the library. The definitions remain separate, compiled into the other files mentioned above. To make a complete program, the ordinary object modules of the program must be linked with the library files of the library.

CodePudding user response:

Compiling transforms each translation unit to a special format representing the binary code that belongs to a single translation unit, along with some additional information to connect multiple units together.

Linking: Establishing the linking between all the modules or all the functions of the program in order to continue the program execution is called linking.

  • Related