Home > Mobile >  Why do I need ".global main" to make this program compile?
Why do I need ".global main" to make this program compile?

Time:11-14

I have this assemly code for ARM.

.text
.global main


fib:
    
    push { r4}
    MOV r1, #1 
    MOV r2, #0  
    MOV r3, #1
    
    loop:
    CMP r3, r0
    BGE exit
    mov r4,r1
    ADD r1, r1,r2
    mov r2, r4
   
    add r3, r3, #1
    B loop
    
    exit:
    pop {r4}
    mov r0, r1
    MOV PC,  lr


main:
  
    mov r0, #13
    push {lr}
    BL fib
    pop {lr}
    mov r1, r0
    ldr r0, =output_string
    push {lr}
    bl printf
    pop {lr}
    MOV PC, lr




@ The 'data' section contains static data for our program
.data
output_string:
    .asciz "%d\n"

But I am wondering, why do I need the ".global main" for it to compile? I read on an answer here that this tells the compiler that it will be visible to the linker because other object files will use it. But don't I only have one object file here?

Does it also tell us that the program should start there, is it therefore it doesn't work without it?

CodePudding user response:

Regarding "But don't I only have one object file here?", you still need to give an entry point. The .text makes the following lines belong to the "code" section (non modifiable), compared to the .data section which includes statically defined variables.

In the .text, there should be an entry point. In your case it is the "main" program. In other instances, for example when you use an assembler, rather than GCC, the _start is the entry point, and you would have used

.global _start

_start:

CodePudding user response:

Short answer at the bottom...

You use the word compile instead of assemble and that connects the dots as to what you are doing. You are also using main without any bootstrap which also implies you want someone else to supply the entry point and bootstrap.

main:
    b .

arm-none-eabi-as so.s -o so.o
arm-none-eabi-objdump -d so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <main>:
   0:   eafffffe    b   0 <main>

That is how you assemble it into an object. The gcc program itself is not a compiler it is a shell program that calls other programs a C pre processor (expands includes and defines for example) a C compiler, an assembler and linker. Unless you tell it not to.

So

arm-none-eabi-gcc so.s -o so
/storage4/gnu/arm/bin/../lib/gcc/arm-none-eabi/12.2.0/../../../../arm-none-eabi/bin/ld: cannot find crt0.o: No such file or directory
/storage4/gnu/arm/bin/../lib/gcc/arm-none-eabi/12.2.0/../../../../arm-none-eabi/bin/ld: cannot find -lc: No such file or directory
collect2: error: ld returned 1 exit status

In my case which tells even more of the story.

This is your story I assume.

arm-linux-gnueabi-gcc so.s -o so
/usr/lib/gcc-cross/arm-linux-gnueabi/9/../../../../arm-linux-gnueabi/bin/ld: /usr/lib/gcc-cross/arm-linux-gnueabi/9/../../../../arm-linux-gnueabi/lib/../lib/crt1.o: in function `_start':
(.text 0x34): undefined reference to `main'
collect2: error: ld returned 1 exit status

You are basically trying to "compile" a C program but are using some assembly and the name main() which is special for C.

Most folks do not build their own gnu toolchain they simply download one, even if they did at least for a native (not cross) compiler the build would try to detect the host and build for it including operating system specific items and perhaps a C library as well.

The C library (printf, memcpy, etc) tends to, unfortunately, load stuff up on the bootstrap and in particular the linker script. So you have this intimate marriage between linker script and bootstrap (that marriage generally has to be there for a proper C bootstrap) and the C library (not so much).

Someone way back when for whatever reason (do not ask the "why" question, assume there is no answer)) apparently used the label start: and then that became _start. And at least for gnu associated C library bootstrap that has stuck. One of the things that you also get for free or as baggage (eye of the beholder), is a default linker script, if you do not specify one there is (at least one) default.

That default has the line

ENTRY(_start)

Which you can grep for to find these linker scripts.

And then the bootstrap, often called crt0.S in at least the gnu associated world, perhaps others since there are too many for any one person to know.

That _start eventually results in a call to main(). Depends on the the toolchain or library or a combination of as to how many steps it takes to get there.

I can link the above, into a boring program.

arm-none-eabi-ld so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -d so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .text:

00008000 <main>:
    8000:   eafffffe    b   8000 <main>

It used the default linker script and did not find the _start entry point (global label) in any of the objects so it gave a warning that it would just use the start of .text.

We can continue the journey with a equally useless bootstrap but one that will connect the dots.

_start:
    bl main
    b .

arm-none-eabi-as bs.s -o bs.o
arm-none-eabi-ld bs.o so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-ld: bs.o: in function `_start':
(.text 0x0): undefined reference to `main'

So we need to do two things

bs.s

.globl _start
_start:
    bl main
    b .

so.s

.globl main
main:
    b .


arm-none-eabi-as bs.s -o bs.o
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld bs.o so.o -o so.elf
arm-none-eabi-objdump -d so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .text:

00008000 <_start>:
    8000:   eb000000    bl  8008 <main>
    8004:   eafffffe    b   8004 <_start 0x4>

00008008 <main>:
    8008:   eafffffe    b   8008 <main>

And now the tools are happy and it is building a program based on a linker script we are not controlling but we have controlled all of the code.

If we go back to this and let gcc call all the programs with defaults.

arm-linux-gnueabi-gcc so.s -o so.elf
arm-linux-gnueabi-objdump -d so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .init:

00010294 <_init>:
   10294:   e92d4008    push    {r3, lr}
   10298:   eb00001d    bl  10314 <call_weak_fn>
   1029c:   e8bd8008    pop {r3, pc}

there is a ton of stuff in there...which you can see for your self, your little main and other function is buried in all of this

If you use readelf

  Entry point address:               0x102d8

Which the operating system needs to know.

000102d8 <_start>:
   102d8:   e3a0b000    mov fp, #0
   102dc:   e3a0e000    mov lr, #0
   102e0:   e49d1004    pop {r1}        ; (ldr r1, [sp], #4)
   102e4:   e1a0200d    mov r2, sp
   102e8:   e52d2004    push    {r2}        ; (str r2, [sp, #-4]!)
   102ec:   e52d0004    push    {r0}        ; (str r0, [sp, #-4]!)
   102f0:   e59fc010    ldr ip, [pc, #16]   ; 10308 <_start 0x30>
   102f4:   e52dc004    push    {ip}        ; (str ip, [sp, #-4]!)
   102f8:   e59f000c    ldr r0, [pc, #12]   ; 1030c <_start 0x34>
   102fc:   e59f300c    ldr r3, [pc, #12]   ; 10310 <_start 0x38>
   10300:   ebffffeb    bl  102b4 <__libc_start_main@plt>
   10304:   ebfffff0    bl  102cc <abort@plt>
   10308:   0001042c    .word   0x0001042c
   1030c:   000103c8    .word   0x000103c8

and there is _start as expected for a system/target/toolchain like this.

main is here (just where it landed from all the stuff being linked)

000103c8 <main>:
   103c8:   eafffffe    b   103c8 <main>

And for whatever reason.

000102d8 <_start>:
   102d8:   e3a0b000    mov fp, #0
   102dc:   e3a0e000    mov lr, #0
   102e0:   e49d1004    pop {r1}        ; (ldr r1, [sp], #4)
   102e4:   e1a0200d    mov r2, sp
   102e8:   e52d2004    push    {r2}        ; (str r2, [sp, #-4]!)
   102ec:   e52d0004    push    {r0}        ; (str r0, [sp, #-4]!)
   102f0:   e59fc010    ldr ip, [pc, #16]   ; 10308 <_start 0x30>
   102f4:   e52dc004    push    {ip}        ; (str ip, [sp, #-4]!)
   102f8:   e59f000c    ldr r0, [pc, #12]   ; 1030c <_start 0x34>  <-------
   102fc:   e59f300c    ldr r3, [pc, #12]   ; 10310 <_start 0x38>
   10300:   ebffffeb    bl  102b4 <__libc_start_main@plt>
   10304:   ebfffff0    bl  102cc <abort@plt>
   10308:   0001042c    .word   0x0001042c
   1030c:   000103c8    .word   0x000103c8 <--------
   10310:   000103cc    .word   0x000103cc

r0 contains the address to main and I will let you unravel the rest, it is just years/decades of accumulated pain.

If we were to choose to do this though:

so.ld

MEMORY
{
    one : ORIGIN = 0x00001000, LENGTH = 0x1000
}
SECTIONS
{
    .text : { *(.text*) } > one
}


arm-linux-gnueabi-gcc -Wl,-Tso.ld bs.o so.o -o so.elf
/usr/lib/gcc-cross/arm-linux-gnueabi/9/../../../../arm-linux-gnueabi/bin/ld: bs.o: in function `_start':
(.text 0x0): multiple definition of `_start'; /usr/lib/gcc-cross/arm-linux-gnueabi/9/../../../../arm-linux-gnueabi/lib/../lib/crt1.o:(.text 0x0): first defined here
/usr/lib/gcc-cross/arm-linux-gnueabi/9/../../../../arm-linux-gnueabi/bin/ld: error: no memory region specified for loadable section `.note.gnu.build-id'
collect2: error: ld returned 1 exit status

Because gcc wants to suck in the default bootstrap it knows about. And thus Peter's comment.

arm-linux-gnueabi-gcc -nostartfiles -nostdlib -Wl,-Tso.ld,--build-id=none bs.o so.o -o so.elf
arm-linux-objdump -d so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .text:

00001000 <_start>:
    1000:   eb000000    bl  1008 <main>
    1004:   eafffffe    b   1004 <_start 0x4>

00001008 <main>:
    1008:   eafffffe    b   1008 <main>

And now we have used a compiler that has a strong association with a target to use as an assembler and linker. What if I

ENTRY(hello)
MEMORY
{
    one : ORIGIN = 0x00001000, LENGTH = 0x1000
}
SECTIONS
{
    .text : { *(.text*) } > one
}

/usr/lib/gcc-cross/arm-linux-gnueabi/9/../../../../arm-linux-gnueabi/bin/ld: warning: cannot find entry symbol hello; defaulting to 0000000000001000

It works in this case, but to fix that I need to connect all the dots.

bs.s

.globl hello
hello:
    bl main
    b .

And it is fixed again likewise:

.globl hello
hello:
    bl world
    b .

broken

(.text 0x0): undefined reference to `world'

so.s

.globl world
world:
    b .

fixed.

You have created two labels (be careful as you can get into other trouble see below)(they are not "functions"). One of which main implying you want to tap into the C tools with some assembly. You also push lr to allow for nested functions but nowhere in your code do you have a bootstrap (at a minimum set the stack pointer and call main(), but also prepare .bss and .data as needed by the target, plus the enormous mountain of stuff behind printf). So you are wanting all the C library overhead and expecting to run this on an operating system we assume (from your incomplete question).

So you want main to be called from the bootstrap so you have to make it global so that the callee can see it.

Another problem here you will run into is your assembly language is 32 bit arm or thumb depending on the command line.

ahh arm not thumb:

MOV PC, lr

But for example

unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
    return(more_fun(x) 1);
}

armv5t or later but not armv7

Disassembly of section .text:

00000000 <fun>:
   0:   e92d4010    push    {r4, lr}
   4:   ebfffffe    bl  0 <more_fun>
   8:   e2800001    add r0, r0, #1
   c:   e8bd8010    pop {r4, pc}

starting with the armv7s (well and ending with them as well since armv8-a is 64 bit with a possible armv7 mode)

arm-linux-gnueabi-gcc -march=armv7 -O2 -c fun.c -o fun.o
arm-linux-gnueabi-objdump -d fun.o

fun.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <fun>:
   0:   b508        push    {r3, lr}
   2:   f7ff fffe   bl  0 <more_fun>
   6:   3001        adds    r0, #1
   8:   bd08        pop {r3, pc}
   a:   bf00        nop

so.s

.globl world
world:
    bl fun
    b .

.global more_fun
more_fun:
    bx lr

this will crash

Disassembly of section .text:

00001000 <hello>:
    1000:   eb000000    bl  1008 <world>
    1004:   eafffffe    b   1004 <hello 0x4>

00001008 <world>:
    1008:   fa000001    blx 1014 <fun>
    100c:   eafffffe    b   100c <world 0x4>

00001010 <more_fun>:
    1010:   e12fff1e    bx  lr

00001014 <fun>:
    1014:   b508        push    {r3, lr}
    1016:   f7ff fffb   bl  1010 <more_fun>
    101a:   3001        adds    r0, #1
    101c:   bd08        pop {r3, pc}
    101e:   bf00        nop

it is bouncing between arm and thumb without the proper trampoline (veneer). It is not mixing arm and thumb modes well.

so.s

.globl world
.global more_fun
.type more_fun,%function
.type world,%function

world:
    bl fun
    b .

more_fun:
    bx lr

It does not need a trampoline as blx is fine, but we can see that it did see the call to fun() correctly as in the assembly generated by gcc

    .thumb_func
    .type   fun, %function
fun:

both solutions are used (for thumb mode you can use .thumb_func before the label (next label it sees), for arm there is not a shortcut, but as you can see the .globl or .global and .type are not position specific like .thumb_func) and the tool knew what to do with fun but not more_fun until we specified.

To see the trampoline drop down to armv6 or older

arm-none-eabi-gcc -O2 -c  -mthumb fun.c -o fun.o
arm-linux-gnueabi-ld -Tso.ld bs.o so.o fun.o -o so.elf
arm-linux-gnueabi-objdump -d so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .text:

00001000 <hello>:
    1000:   eb000000    bl  1008 <world>
    1004:   eafffffe    b   1004 <hello 0x4>

00001008 <world>:
    1008:   eb000006    bl  1028 <__fun_from_arm>
    100c:   eafffffe    b   100c <world 0x4>

00001010 <more_fun>:
    1010:   e12fff1e    bx  lr

00001014 <fun>:
    1014:   b510        push    {r4, lr}
    1016:   f000 f80d   bl  1034 <__more_fun_from_thumb>
    101a:   3001        adds    r0, #1
    101c:   bc10        pop {r4}
    101e:   bc02        pop {r1}
    1020:   4708        bx  r1
    1022:   46c0        nop         ; (mov r8, r8)
    1024:   0000        movs    r0, r0
    ...

00001028 <__fun_from_arm>:
    1028:   e59fc000    ldr ip, [pc]    ; 1030 <__fun_from_arm 0x8>
    102c:   e12fff1c    bx  ip
    1030:   00001015    .word   0x00001015

00001034 <__more_fun_from_thumb>:
    1034:   4778        bx  pc
    1036:   e7fd        b.n 1034 <__more_fun_from_thumb>
    1038:   eafffff4    b   1010 <more_fun>
    103c:   00000000    andeq   r0, r0, r0

so you might need to add

.type main, %function

to your code as well as the .globl or .global (your preference). Not to make the toolchain happy but to make the code possibly execute.

Use bx lr instead of mov pc,lr. The arm only armv4 and older (?) days are long gone.

Or you can just

pop {pc}

instead of these two in this order

pop {lr}
MOV PC, lr

or

pop {lr}
bx lr

You might be able to get away with mov pc,lr today but for a long time that was a problem. Would have to look up interworking for the cores architecture you are using, over time more and more instructions could be used.


Why do I need ".global main" to make this program compile?

You are using the word "compile" implying that you are using gcc not an assembler (and linker)(also you have a main()). Gcc calls the linker with a default linker script and a default bootstrap (crt0.o) based on how that gcc was built for that system. crt0.o calls main(). You are linking your object with a program that calls it and have not provided an external label called main so the linker will not find a main() anywhere in the build and thus complain.

You do not have only one object you have crt0.o as well as the C library. printf() by itself sucks in a ton of the C library. The C library will likely be fed to the linker as a library (something.a) not a long list of object files. Nevertheless it is like having a ton more object files. If you had the C library as objects there we be a very very long list of object files needed to make the linker happy.

Your few lines of code are sucking in a ton of stuff as shown above. It is not remotely just your one object file.

  • Related