Decoding ARM instruction from GDB-CodePudding

I want to understand/decode the ARM instructions on my aarch64 device.

I have the following code written in C language:

void test_function(int a, int b, int c, int d) {
  int flag;
  char buffer[10];

  flag = 31337;
  buffer[0] = 'A';
}

int main() {
  test_function(1, 2, 3, 4);
}

gcc -g stack_example.c and gdb -q ./a.out yields the following assembly:

(gdb) disass main
Dump of assembler code for function main:
   0x00000000000016d4 < 0>:     stp     x29, x30, [sp, #-16]!
   0x00000000000016d8 < 4>:     mov     x29, sp
   0x00000000000016dc < 8>:     mov     w0, #0x1                        // #1
   0x00000000000016e0 < 12>:    mov     w1, #0x2                        // #2
   0x00000000000016e4 < 16>:    mov     w2, #0x3                        // #3
   0x00000000000016e8 < 20>:    mov     w3, #0x4                        // #4
   0x00000000000016ec < 24>:    bl      0x16a8 <test_function>
   0x00000000000016f0 < 28>:    mov     w0, wzr
   0x00000000000016f4 < 32>:    ldp     x29, x30, [sp], #16
   0x00000000000016f8 < 36>:    ret
End of assembler dump.
(gdb) disass test_function
Dump of assembler code for function test_function:
   0x00000000000016a8 < 0>:     sub     sp, sp, #0x20
   0x00000000000016ac < 4>:     str     w0, [sp, #28]
   0x00000000000016b0 < 8>:     str     w1, [sp, #24]
   0x00000000000016b4 < 12>:    str     w2, [sp, #20]
   0x00000000000016b8 < 16>:    str     w3, [sp, #16]
   0x00000000000016bc < 20>:    mov     w8, #0x7a69                     // #31337
   0x00000000000016c0 < 24>:    str     w8, [sp, #12]
   0x00000000000016c4 < 28>:    mov     w8, #0x41                       // #65
   0x00000000000016c8 < 32>:    strb    w8, [sp, #2]
   0x00000000000016cc < 36>:    add     sp, sp, #0x20
   0x00000000000016d0 < 40>:    ret
End of assembler dump.

When I now do break 10, break test_function, run and disass main i get

(gdb) disass main
Dump of assembler code for function main:
   0x00000055907a86d4 < 0>:     stp     x29, x30, [sp, #-16]!
   0x00000055907a86d8 < 4>:     mov     x29, sp
   0x00000055907a86dc < 8>:     mov     w0, #0x1                        // #1
   0x00000055907a86e0 < 12>:    mov     w1, #0x2                        // #2
   0x00000055907a86e4 < 16>:    mov     w2, #0x3                        // #3
   0x00000055907a86e8 < 20>:    mov     w3, #0x4                        // #4
=> 0x00000055907a86ec < 24>:    bl      0x55907a86a8 <test_function>
   0x00000055907a86f0 < 28>:    mov     w0, wzr
   0x00000055907a86f4 < 32>:    ldp     x29, x30, [sp], #16
   0x00000055907a86f8 < 36>:    ret
End of assembler dump.

Now according to Arm Architecture Reference Manual Armv8, for A-profile architecture, page 934 the BL instruction starts with 100101 followed by a 26bit immediate value.

Examining the memory at the position of the program counter with yields

(gdb) x/16b 0x55907a86ec
0x55907a86ec <main 24>: 11101111        11111111        11111111        10010111        11100000        00000011        00011111        00101010
0x55907a86f4 <main 32>: 11111101        01111011        11000001        10101000        11000000        00000011        01011111        11010110

I think, the instruction starts in the fourth byte, but i am not sure. I tried to reconstruct the adress 0x55907a86a8, but wasn't able. Could anyone please help?

CodePudding user response：

AArch64 instructions are encoded little-endian, so if you dump the code one byte at a time, each 4-byte word will have the bytes in reverse order. Therefore, from the output you have, you'll have to take the first 4 bytes, reverse their order (but do not reverse the bits within the bytes), and concatenate them. (You can get the debugger to do it for you by doing x/tw 0x55907a86ec instead.) This gives:

10010111111111111111111111101111

Indeed the highest 6 bits are the opcode 100101. The immediate is 11111111111111111111101111. This is a negative number in two's-complement (recall that the immediate is sign-extended), with value -17 or -0x11. This number is multiplied by 4 (shift left two bits), yielding -0x44 and added to the address of the bl instruction itself to find the branch target. And indeed 0x00000055907a86ec - 0x44 = 0x55907a86a8, which is the address the debugger showed you, and would be the address of the first instruction of test_function.

Note that ASLR was done when you began running the program, which is why the disassembly shows different addresses before and after run. If you did disassemble test_function after run, you should see it start at 0x55907a86a8. Nonetheless, if you look at the pre-ASLR disassembly, you'll notice that the relative displacement between the address of the bl test_function instruction (0x16ec) and the address of test_function itself (0x16a8) is the same -0x44. (Indeed ASLR is done in page units, so the low 12 bits are unchanged by it.)