Assembly of C program and gcc generated things that nowhere can be found, do I need those sections &-CodePudding

This is a simple program in C.

char a;
void main(){};

And it caused this assembly to be generated startig with

.text
.globl  a
.bss
.type   a, @object
.size   a, 1

so I like to know how to interpret the above

so I see .text I belive this is just symbol . and text means start of code section And U see .global so I believe my variable(s) that start right after that will be global variables or functions, etc. or do I need to write section name, i.e. .text right before all variables and functions? this is the question

then u see .bss now after that . and bss all uninitialied variables and functions are declared

and then finally I see something akin to what my C program had a global variable named char a like

.type   a, @object

so .type tells what is it so I assume its of object type as mentioned with @ and object in .type a,@object

so now size which is 1 char. so this line

.size   a, 1

so I assume if I had global int a; then that would be

.size a,4

char is 1 byte int is 4 bytes

then moving on

I have

a:

so the first few lines becomes like following

assume this is code 1

# my comment 1
# my comment 2
    .text
    .globl  a
    .bss
    .type   a, @object
    .size   a, 1
a:

So the question is why a: is at the bottom

what if I do like this

this is code 2

a:
    .text
    .globl  a
    .bss
    .type   a, @object
    .size   a, 1

so I like to know is code 1 and code 2 same? to declare or define a: appearing first in one and at second in code 2

so from above my a is in .text and .global and .bss and .type is @object and size is 1 byte. This is lots of code to define just one char variable. So is it correct understanding??? should I doubt it

further moving on, now it turn of a global main which is in .text section plus .global

so I see

.zero   1
.text
.globl  main
.type   main, @function

main:

so I really dont want to care about .zero 1 line but if I am wrong not to care then tell me the use of it. so again have my gcc place main in .zero (some section???) and .text section plus .global code section and the type is @function so now I know type come after , as in .type main,@function and in .type a, @object

then I encounter complete BS, searching for .LFB0: brought zero google search results

is .LFB0: a some section of program that my x86-64 processor can run

and .cfi_startproc is eh_frame so I read .eh_frame is a section that lives in the loaded part of the program. so I like to know if I am coding in assembly can I ignore .cfi_startproc line. but What is the point of this. does this mean after this everything is loaded in memory or registers and and is .ehframe

main:
.LFB0:
    .cfi_startproc
    endbr64 
    pushq   %rbp    #
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp  #,
    .cfi_def_cfa_register 6

so if I am making a simple assembly program simlar to above C program in assembly do I need to code from .LFB0: to movq %rsp, %rbp #,\n.cfi_def_cfa_register 6 if not needed then I can assume my program will become

    .text
    .globl  a
    .bss
    .type   a, @object
    .size   a, 1
a:
    .zero   1
    .text
    .globl  main
    .type   main, @function
main:
             .cfi_startproc
    pushq   %rbp    
    movq    %rsp, %rbp  
    nop 
    popq    %rbp    

    ret 
             .cfi_endproc

so my full program becomes above, how to compile this with nasm can any one please tell I believe I have to save it with .s or .S extension which one s small or large S? I am coding in Ubuntu

This is gcc generated code

        .file   "test.c"
    # GNU C17 (Ubuntu 11.2.0-7ubuntu2) version 11.2.0 (x86_64-linux-gnu)
    #   compiled by GNU C version 11.2.0, GMP version 6.2.1, MPFR version 4.1.0, MPC version 1.2.0, isl version isl-0.24-GMP

    # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
    # options passed: -mtune=generic -march=x86-64 -fasynchronous-unwind-tables -fstack-protector-strong -fstack-clash-protection -fcf-protection
        .text
        .globl  a
        .bss
        .type   a, @object
        .size   a, 1
    a:
        .zero   1
        .text
        .globl  main
        .type   main, @function
    main:
    .LFB0:
        .cfi_startproc
        endbr64 
        pushq   %rbp    #
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp  #,
        .cfi_def_cfa_register 6
    # test.c:2: void main(){};
        nop 
        popq    %rbp    #
        .cfi_def_cfa 7, 8
        ret 
        .cfi_endproc
    .LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 11.2.0-7ubuntu2) 11.2.0"
        .section    .note.GNU-stack,"",@progbits
        .section    .note.gnu.property,"a"
        .align 8
        .long   1f - 0f
        .long   4f - 1f
        .long   5
    0:
        .string "GNU"
    1:
        .align 8
        .long   0xc0000002
        .long   3f - 2f
    2:
        .long   0x3
    3:
        .align 8
    4:

CodePudding user response：

.text is a directive that tells the assembler to start a program code section (the “text” section of the program, a read-only executable section containing mostly instructions to be executed). It is here because GCC without optimization always puts a .text at the top of the file, even if it's about to switch to another section (like .bss in this case) and then back to .text when it's ready to emit some bytes into that section (in your case, a definition for main). GCC does still parse the whole compilation unit before emitting any asm, though; it's not just compiling one global variable / function at a time as it goes along.

.globl a is a directive that tells the assembler that a is a “global” symbol, so its definition should be listed as an external symbol for the linker to link with.

.bss is a directive that tells the assembler to start the “block starting symbol” section (which will contain data that is initialized to zero or, on some systems, mostly older, is not initialized).

.type a @object and .size a, 1 are directives that describe the type and size of an object named a. The assembler adds this information to the symbol table or other information in the object file it outputs. It is useful for debuggers to know about the types of objects.

a: is label. It acts to define the symbol. As the assembler reads assembly, it counts bytes in the section it is current generated. Each data declaration or instruction takes up some bytes, and the assembler counts those. When it sees a label, it associates the label with the current count. (This is commonly called the program counter even when it is counting data bytes.) When the assembler writes information about a to the symbol table, it will include the number of bytes it is from the beginning of the section. When the program is loaded into memory, this offset is used to calculate the address where the object a will be in memory.

So the question is why a: is at the bottom

a: must be after .bss because a will be put into the section the assembler is currently working on, so that needs to be set to the desired section before declaring the label. The location of a relative to the other directives might be flexible, so that reordering them would have no consequence.

so I like to know is code 1 and code 2 same?

No, a: must appear after .bss so that it is put into the correct section.

.zero 1 says to emit 1 zero byte in the current section. Like (almost?) all directives GCC uses, it's well documented in the GNU assembler manual: https://sourceware.org/binutils/docs/as/Zero.html

so again have my gcc place main in .zero

No, .text starts (or switches back to) the code section, so main will be in the code section.

is .LFB0: a some section of program that my x86-64 processor can run

Anything ending with a colon is a label. .LFB0 is a local label the compiler is using in case it needs it as a jump or branch target.

so I like to know if I am coding in assembly can I ignore .cfi_startproc line.

When writing assembly for simple functions without exception handling and related features, you can ignore .cfi_startproc and other call-frame information directives that generate metadata that goes in the .eh_frame section. (Which is not executed, it's just there as data in the file for exception handlers and debuggers to read.)

… if not needed then I can assume my program will become…

If you are omitting some of the .cfi… directives, I would omit all of them, unless you look into what they do and determine which ones can be omitted selectively.

I believe I have to save it with .s or .S extension which one s small or large S?

With GCC and Clang, assembly files ending in .S are processed by the “preprocessor” before assembly, and assembly files ending in .s are not. This is the preprocessor familiar from C, with #define, #if, and other directives. Other tools may not do this. If you are not using preprocessor features, it generally does not matter whether you use .s or .S.