What's the difference between "ldr pc, _boot" and "b

I was examining a vector table for the ARM Cortex A9 and stumbled accross two types of instructions:

B _boot

and

LDR PC, _boot

Can someone explain to me the difference in using B or LDR? Both code should do the same but apparently there must be a difference. Has it something to do with the link register?

Thanks for your time!

CodePudding user response：

ldr reg, symbol loads data from memory at that address, into the register. Loading into PC is a memory-indirect jump.
It will only assemble and link if _boot is near enough for a PC-relative addressing mode to reach it, but that's likely if both are in the .text section.

b symbol sets PC = the address of the symbol. It's direct relative jump.

The link register is no involved in either case because you use b not bl or blx.

`ldr pc, _boot`

Another way to do what ldr pc, _boot does:

   ldr  r0, =_boot         @ "global variable" address into register
   ldr  r0, [r0]           @ load 4 bytes from that symbol address
   br   r0                 @ and set PC = that load result

Assuming your _boot: label is in front of some code, rather than a .word another_symbol, this is not what you want. You'd be loading some bytes of machine code and using it as an address. (Setting PC to somewhere probably invalid.)

But if you do have _boot: .word foobar or something, then it is what you want.

`b _boot`

Or the equivalent of b _boot in terms of ldr would be to load the address from memory into PC. That would mean you'd need a word in memory holding that address, instead of just the immediate relative displacement in the b encoding.

But ARM assemblers have a pseudo-instruction to do that:
ldr pc, =_boot will load that label address into PC, using a PC-relative addressing mode to load from a nearby literal pool. Or instead of into PC directly, you could set up for a br.

  ldr  r0, =_boot       @ symbol address into register
  br   r0               @ jump to that symbol

This is not exactly equivalent: it's not position-independent because it's using absolute addresses, not just a relative branch.

CodePudding user response：

Peter's answer covers the what. This is an attempt at the why. Why do you see some code uses b and some uses ldr pc.

Starting with an abbreviated exception table

b reset
b handler_a
b handler_b
b handler_c

reset:
    mov sp,#0x8000
    b .
handler_a:
    b .
handler_b:
    b .
handler_c:
    b .

Which generates this

10000000 <_stack 0xff80000>:
10000000:   ea000002    b   10000010 <reset>
10000004:   ea000003    b   10000018 <handler_a>
10000008:   ea000003    b   1000001c <handler_b>
1000000c:   ea000003    b   10000020 <handler_c>

10000010 <reset>:
10000010:   e3a0d902    mov sp, #32768  ; 0x8000
10000014:   eafffffe    b   10000014 <reset 0x4>

10000018 <handler_a>:
10000018:   eafffffe    b   10000018 <handler_a>

1000001c <handler_b>:
1000001c:   eafffffe    b   1000001c <handler_b>

10000020 <handler_c>:
10000020:   eafffffe    b   10000020 <handler_c>

The address 0x10000010 is not encoded in the b(ranch) instruction. Instead a pc relative offset is.

10000000:   ea000002 -2 b   10000010 <reset>
10000004:   ea000003 -1 b   10000018 <handler_a>
10000008:   ea000003  0
1000000c:   ea000003  1
10000010:   e3a0d902  2 mov sp, #32768  ; 0x8000

The first instruction is encoded with a 2 as an immediate. For this instruction this is in units of words. The pc is two instructions ahead when executed, so pc 2 gets you to the reset handler 0x10000010.

For ldr pc

ldr pc,reset_address
ldr pc,handler_a_address
ldr pc,handler_b_address
ldr pc,handler_c_address

reset_address    :  .word reset
handler_a_address:  .word handler_a
handler_b_address:  .word handler_b
handler_c_address:  .word handler_c

reset:
    mov sp,#0x8000
    b .
handler_a:
    b .
handler_b:
    b .
handler_c:
    b .

Which gives

10000000 <_stack 0xff80000>:
10000000:   e59ff008    ldr pc, [pc, #8]    ; 10000010 <reset_address>
10000004:   e59ff008    ldr pc, [pc, #8]    ; 10000014 <handler_a_address>
10000008:   e59ff008    ldr pc, [pc, #8]    ; 10000018 <handler_b_address>
1000000c:   e59ff008    ldr pc, [pc, #8]    ; 1000001c <handler_c_address>

10000010 <reset_address>:
10000010:   10000020    .word   0x10000020

10000014 <handler_a_address>:
10000014:   10000028    .word   0x10000028

10000018 <handler_b_address>:
10000018:   1000002c    .word   0x1000002c

1000001c <handler_c_address>:
1000001c:   10000030    .word   0x10000030

10000020 <reset>:
10000020:   e3a0d902    mov sp, #32768  ; 0x8000
10000024:   eafffffe    b   10000024 <reset 0x4>

10000028 <handler_a>:
10000028:   eafffffe    b   10000028 <handler_a>

1000002c <handler_b>:
1000002c:   eafffffe    b   1000002c <handler_b>

10000030 <handler_c>:
10000030:   eafffffe    b   10000030 <handler_c>

This instruction encodes using a byte offset

10000000:   e59ff008    ldr pc, [pc, #8]
10000004:   e59ff008    ldr pc, [pc, #8]
10000008:   e59ff008  0
1000000c:   e59ff008  4
10000010:   10000020  8 .word   0x10000020

The so one level of indirection the pc gets the value from the word in the address of pc offset. pc = [0x10000010]

Now note in both cases because we use labels and tools that do the work for us, computing the offsets for branch, the ldr pc, linking and placing the addresses for the handlers, etc. Something we do not really want to do ourselves if we can avoid it.

Now take a very real situation where you boot the processor off of a flash. And you are on a pre-VTOR ARM processor. Some of these want to run an operating system. So you may want one exception table for the bootloader (grossly overcomplicated like u-boot for example, which is to some extent its own operating system). Which means you want to have ram at 0x00000000. ARM is not a chip vendor they make the processor core, chip vendors make the chip and make these decisions. Some chip vendors will map a flash at lets say 0x10000000 as an example. In order to boot and assuming the entry address is 0x00000000 (something the chip vendors control, but usually it is 0x00000000 for normal booting). So when they release reset on the ARM core we need the contents 0x10000000 mapped to 0x00000000. Memory is its own banks, flash, ram, peripherals, etc. The chip vendor controls all of this and can make it fixed or can make it programmable, such that there is say signals and a control register that allow on reset fetches to 0x00000000 to go to the flash that is also mapped at 0x10000000 normally. Maybe a small portion. On boot/reset the fetch of 0x00000000 gets the instruction there be it a b or ldr pc (for normal exception table use, you have one instruction to get out of the table so that means b or ldr pc).

So for the reset both methods will work, assuming there is enough of the flash mirrored to 0x00000000 (for a successful design, it will be).

If you use the branch method for one of these chips then your pc when you branch to the reset handler (reset label) is 0x00000010 not 0x10000010, so you now need to get yourself to the right address. Some folks would just orr 0x10000000 to the pc as a quick hack. Or would ultimately use a ldr pc of some label.

Then you would be running from flash. You would at some point want some ram and would map ram to address 0x00000000 (these chips exist, not an unheard of design). And then start using it. But you have the problem that you no longer have an exception table at the right address 0x00000000. This is pre-VTOR, but even with VTOR you may want to think about all of this. With the branch method you would want to think about maybe creating your own instructions that branch forward 0x10000000...If that is possible 0xea..xxxx. I would have to look at the encoding but it is 0x10000000-8 / 4 or 0x10000000/4 - 2 or 0x04000000 - 2 and that will not fit clearly. If we were trying to reach 0x8000 for example that would be 0x2000 - 2 which would fit in the instruction. So we would want to do something like

0xe59ff008
0xe59ff008
0xe59ff008
0xe59ff008
0x10000000
0x10000004
0x10000008
0x1000000C

and write those ourselves starting at 0x00000000 so that if an exception comes in we do a load of an address and that address is the one in flash.

Now instead of we use the ldr method. Then we boot, we are not entering reset with the address 0x00000020 but instead the desired 0x10000020, we do not have to mess with that address. If we copy the first in this example 0x20 bytes from 0x10000000 to 0x00000000 then our handlers are all in place, we did not have to create any instructions or addresses, the tools did all the work we just copy the work.

Many/most processors use a vector table with addresses in the table, ARM in these processors use fixed addresses that you supply an instruction. And reset is the first one so if you were really building a binary with an entry point of 0x10000000 it certainly does not need to have a table, it only needs the entry point and code begins.

reset:
  mov sp,0x0x8000
  ...

Once booted you can build the handler in ram dynamically, generate the ldr pc instructions, fill in addresses into a table by having code that asks the linker for that address (write32(0x0004,(unsigned int)handler_a);).

Other hardware designs could have the flash at 0x00000000 and ram elsewhere. And you may have the same desire to have a table with entries more than just reset and may wish to runtime change one of them. The code would be linked for 0x00000000 and you cannot now change the vector table because it is in flash, so for that you would want to do something like this (if flash is at 0x10000000)

ldr pc,a
ldr pc,b
ldr pc,c
ldr pc,d

a:  .word 0x10000000
b:  .word 0x10000004
c:  .word 0x10000008
d:  .word 0x1000000C

reset_address    :  .word reset
handler_a_address:  .word handler_a
handler_b_address:  .word handler_b
handler_c_address:  .word handler_c

reset:
    mov sp,#0x8000
    b .
handler_a:
    b .
handler_b:
    b .
handler_c:
    b .

Creating

00000000 <a-0x10>:
   0:   e59ff008    ldr pc, [pc, #8]    ; 10 <a>
   4:   e59ff008    ldr pc, [pc, #8]    ; 14 <b>
   8:   e59ff008    ldr pc, [pc, #8]    ; 18 <c>
   c:   e59ff008    ldr pc, [pc, #8]    ; 1c <d>

00000010 <a>:
  10:   10000000    .word   0x10000000

00000014 <b>:
  14:   10000004    .word   0x10000004

00000018 <c>:
  18:   10000008    .word   0x10000008

0000001c <d>:
  1c:   1000000c    .word   0x1000000c

00000020 <reset_address>:
  20:   00000030    .word   0x00000030

00000024 <handler_a_address>:
  24:   00000038    .word   0x00000038

00000028 <handler_b_address>:
  28:   0000003c    .word   0x0000003c

0000002c <handler_c_address>:
  2c:   00000040    .word   0x00000040

00000030 <reset>:
  30:   e3a0d902    mov sp, #32768  ; 0x8000
  34:   eafffffe    b   34 <reset 0x4>

00000038 <handler_a>:
  38:   eafffffe    b   38 <handler_a>

0000003c <handler_b>:
  3c:   eafffffe    b   3c <handler_b>

00000040 <handler_c>:
  40:   eafffffe    b   40 <handler_c>

and on boot copy the four words at 0x0000 to 0x10000000, copy the four words at 0x20 to 0x10000010. Then you can change the words at 0x10000010 if you want to change the handlers runtime.

With VTOR if you do not care to change the vectors while you are running and the flash is at 0x10000000, then you can use ldr pc for reset to get the pc pointed at flash not the mirror at 0x00000000. And then on boot program VTOR to point at 0x10000000. But if you want to dynamically change the address then the table needs to be in ram and you circle back to the days before VTOR.

So....

At an instruction level one loads the pc from a pc relative offset. The other loads the pc from an address pointed to by a pc-relative offset.

At the system level, you are dealing with the exception table that most folks will put at the entry point in flash, and then deal with generating a table at address 0x00000000 if it is not already there.

And if you want to dynamically change the handlers for any reason, then your actual table (or the ones you modify at least) need to be in ram.

ldr pc is the most flexible, adds the most features for you but you ideally have to type a little bit more code. If you have none of these problems, flash starts at 0x00000000 or you are not expecting any interrupts or exceptions so you do not need handlers, then ldr pc takes more bytes and more typing. b(ranch) will work just fine.

The above is just one way to do each thing, you can use the tools other ways or do things manually so long as it functions...