I have a block of instructions for which I wanted to know how the pc register works. It says on Wikipedia, the pc register holds the value of the next instruction to be executed, however, looking at the disassembly graph of binary ninja for the same block of instructions it seems this is not entirely true.
Here is part of the disassembly graph by binary ninja, where in front of each memory load is written the address in memory from which the load happens.
000080ec ldr r3, [pc, #76] -> 0x813c = 0x80f0 0x4c -> pc = 80f0 ?? (shouldnt it be 80ee).
000080ee cmp r3, #0
000080f0 it eq
000080f2 ldreq r3, [pc, #68] -> 0x8138 = 0x80f4 0x44 -> pc = 80f4 (this makes sense).
000080f4 mov sp, r3
000080f6 sub.w sl, r3, #65536 (edited)
this also happens way down the code not always the pc holds the address of the next instruction to be executed.. is there something I should account for?
CodePudding user response:
The key thing you are missing is that the value of PC
in the Thumb ldr Rd, [Pc, #imm]
instruction is aligned to 4 bytes before being used. The slightly abridged pseudo code from the ARMv7 Architecture Reference Manual is:
t = UInt(Rt);
imm32 = ZeroExtend(imm8:’00’, 32);
base = Align(PC,4);
address = base imm32;
data = MemU[address,4];
R[t] = data;
So to come back to your example:
000080ec ldr r3, [pc, #76]
We know that PC
reads as the current address plus 4 bytes in Thumb mode, so PC reads as 0x80f0
. This value is already aligned to 4 bytes, so base
has the same value. To this we add 76
(the immediate is always a multiple of four with the two least significant bits not stored) getting 0x813c
.
For the second example:
000080f2 ldreq r3, [pc, #68]
This is the same instruction as the ldr
above. The disassembler adds an eq
suffix to the mnemonic as the instruction is subject to conditional execution by the preceding IT
block. This does not affect the instruction encoding in any way though.
PC
reads as 0x80f6
which is aligned to 4 bytes as 0x80f4
. To this we add 68
, obtaining 0x8138
as the address to load from.
For further information, refer to the ARM Architecture Reference Manual.
CodePudding user response:
The thumb encoding of the pc relative ldr instruction was just covered recently here on SO. when you look at the documentation on the instruction set you will as we have pointed out know that the PC from a documentation perspective, is two ahead in the early pre-thumb2 days, but now for thumb it is 4 bytes ahead of the instruction address. The pc offset is encoded in units of words so the address being used is
((instruction address 4 ) & 0xFFFFFFFC) (immed<<2)
removing all confusion about the two ahead thing.
The reality is there are multiple program counters, the days of a single program counter used to actual fetch things and do pc relative addressing are a part of history in older, simpler, architectures.
This two ahead thing is part of that past, but for compatibility reasons, has carried on from acorn to the present arm products, just like x86 and others have legacy things that no longer are what they say they are (branch shadow/defer slot in mips).
The pipe is different and one would assume for every different arm product (not architecture but product cortex-m0, cortex-m4, cortex-a7, etc) the pipe implementation and how the core keeps track of things varies. The two ahead is synthesized by some form of a program counter keeping track of the instructions in the pipe. Likewise the fetch/prefetch/branch prediction are all forms of a program counter, but not assumed to be a single program counter. r15 itself is also either real from the register file or fake or both (I would expect not in the register file, why burn those cycles for no value add).
Just like in software you could have a reg[15] array item, a pc_fetch, a pc_current_inst, pc_execution, a pc_possible_branch, a pc_branch_prediction set of variables to keep track of a simulation of a processor, the logic can too. And which one is used at what time depends on what you are doing. What we thing of as programmers as the PC as described in the operation of an instruction is an address that is "two ahead" of the address where the instruction lives. with thumb2 the two ahead no longer makes sense, so for thumb mode it is 4 bytes ahead for arm mode 8 bytes ahead of the instruction address. And then you follow the documentation to understand how that PC is used during execution of the instruction.
For BX and other instructions capable of mode switching the definition of that address which becomes the "program counter" is different, the lsbit drives the mode to switch into (and is stripped off by the branch it does not live in the program counter, there is a psr bit to take care of that). These addresses are also a sort of form of program counter as well that temporarily at least is the actual address of the instruction to branch to and not two ahead.
In a lot of early processor implementations where you had one or the idea of one program counter, you fetched, decoded and executed one instruction at a time before going on to the next (does not mean people no longer do those designs, you can make small and efficient little controllers the old fashioned way and people still do and the are in products we use). In that case the pc is used to fetch the instruction, which may be more than one byte, once the instruction is completely fetched then the program counter points to, at least for the moment, the next instruction. The execution of that instruction can now begin since fetch and decode have completed. If the program counter is used as an input to that instruction then it is pointing to the next instruction, if used as a destination in a jump or branch then it is modified and after completion the next fetch happens wherever it happens to be pointing. Many of these architectures were variable length instruction sets so, one instruction may be one, two, three... bytes long so the pc address relative to the instruction address at execution time, varied. The early arm comes from a pipeline type solution with fixed sized instructions, so if you had a single program counter, then, depending on the pipe design, if you use a textbook style one, then execution is at a fixed depth in the pipe meaning the program counter is fetching that many ahead when you execute.