I want to understand/decode the ARM instructions on my aarch64 device.
I have the following code written in C language:
void test_function(int a, int b, int c, int d) {
int flag;
char buffer[10];
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_function(1, 2, 3, 4);
}
gcc -g stack_example.c
and gdb -q ./a.out
yields the following assembly:
(gdb) disass main
Dump of assembler code for function main:
0x00000000000016d4 < 0>: stp x29, x30, [sp, #-16]!
0x00000000000016d8 < 4>: mov x29, sp
0x00000000000016dc < 8>: mov w0, #0x1 // #1
0x00000000000016e0 < 12>: mov w1, #0x2 // #2
0x00000000000016e4 < 16>: mov w2, #0x3 // #3
0x00000000000016e8 < 20>: mov w3, #0x4 // #4
0x00000000000016ec < 24>: bl 0x16a8 <test_function>
0x00000000000016f0 < 28>: mov w0, wzr
0x00000000000016f4 < 32>: ldp x29, x30, [sp], #16
0x00000000000016f8 < 36>: ret
End of assembler dump.
(gdb) disass test_function
Dump of assembler code for function test_function:
0x00000000000016a8 < 0>: sub sp, sp, #0x20
0x00000000000016ac < 4>: str w0, [sp, #28]
0x00000000000016b0 < 8>: str w1, [sp, #24]
0x00000000000016b4 < 12>: str w2, [sp, #20]
0x00000000000016b8 < 16>: str w3, [sp, #16]
0x00000000000016bc < 20>: mov w8, #0x7a69 // #31337
0x00000000000016c0 < 24>: str w8, [sp, #12]
0x00000000000016c4 < 28>: mov w8, #0x41 // #65
0x00000000000016c8 < 32>: strb w8, [sp, #2]
0x00000000000016cc < 36>: add sp, sp, #0x20
0x00000000000016d0 < 40>: ret
End of assembler dump.
When I now do break 10
, break test_function
, run
and disass main
i get
(gdb) disass main
Dump of assembler code for function main:
0x00000055907a86d4 < 0>: stp x29, x30, [sp, #-16]!
0x00000055907a86d8 < 4>: mov x29, sp
0x00000055907a86dc < 8>: mov w0, #0x1 // #1
0x00000055907a86e0 < 12>: mov w1, #0x2 // #2
0x00000055907a86e4 < 16>: mov w2, #0x3 // #3
0x00000055907a86e8 < 20>: mov w3, #0x4 // #4
=> 0x00000055907a86ec < 24>: bl 0x55907a86a8 <test_function>
0x00000055907a86f0 < 28>: mov w0, wzr
0x00000055907a86f4 < 32>: ldp x29, x30, [sp], #16
0x00000055907a86f8 < 36>: ret
End of assembler dump.
Now according to Arm Architecture Reference Manual Armv8, for A-profile architecture, page 934 the BL instruction starts with 100101 followed by a 26bit immediate value.
Examining the memory at the position of the program counter with yields
(gdb) x/16b 0x55907a86ec
0x55907a86ec <main 24>: 11101111 11111111 11111111 10010111 11100000 00000011 00011111 00101010
0x55907a86f4 <main 32>: 11111101 01111011 11000001 10101000 11000000 00000011 01011111 11010110
I think, the instruction starts in the fourth byte, but i am not sure. I tried to reconstruct the adress 0x55907a86a8, but wasn't able. Could anyone please help?
CodePudding user response:
AArch64 instructions are encoded little-endian, so if you dump the code one byte at a time, each 4-byte word will have the bytes in reverse order. Therefore, from the output you have, you'll have to take the first 4 bytes, reverse their order (but do not reverse the bits within the bytes), and concatenate them. (You can get the debugger to do it for you by doing x/tw 0x55907a86ec
instead.) This gives:
10010111111111111111111111101111
Indeed the highest 6 bits are the opcode 100101
. The immediate is 11111111111111111111101111
. This is a negative number in two's-complement (recall that the immediate is sign-extended), with value -17
or -0x11
. This number is multiplied by 4 (shift left two bits), yielding -0x44
and added to the address of the bl
instruction itself to find the branch target. And indeed 0x00000055907a86ec - 0x44 = 0x55907a86a8
, which is the address the debugger showed you, and would be the address of the first instruction of test_function
.
Note that ASLR was done when you began running the program, which is why the disassembly shows different addresses before and after run
. If you did disassemble test_function
after run
, you should see it start at 0x55907a86a8
. Nonetheless, if you look at the pre-ASLR disassembly, you'll notice that the relative displacement between the address of the bl test_function
instruction (0x16ec
) and the address of test_function
itself (0x16a8
) is the same -0x44
. (Indeed ASLR is done in page units, so the low 12 bits are unchanged by it.)