I just started learning ARM assembly. I am currently on a 32-bit Raspian with "GNU assembler version 2.35.2 (arm-linux-gnueabihf)".
This is my simple program to load part of ascii into a register :
.global _start
_start:
ldr r1,=helloworld
ldr r2,[r1]
@prepare to exit
mov r0,#0
mov r7,#1
svc 0
.data
helloworld:
.ascii "HelloWorld"
I loaded it into gdb and can see that my register r2 loads 0x6c6c6548
(in ascii "lleH"). A quick objdump shows :
Contents of section .data:
0000 48656c6c 6f576f72 6c64 HelloWorld
I have below questions :
- How does the string look like in memory? In other words, when the endianness come into picture? Will reversal happen while loading into memory? Or the string will be loaded as is into memory but gets reversed while loading into register?
- Why the content of register r2 for below program with
.word
is 0x12345678 instead of 0x78563412 ? Why there is no endianess followed?
Note : .word
used instead of .ascii
.global _start
_start:
ldr r1,=helloworld
ldr r2,[r1]
mov r0,#0
mov r7,#1
svc 0
.data
helloworld:
.word 0x12345678
EDIT
The memory dump for first program shows that even the memory has string in same order as in the source code and the object file :
>>> x/32xb 0x1008c
0x1008c: 0x48 0x65 0x6c 0x6c 0x6f 0x57 0x6f 0x72
0x10094: 0x6c 0x64 0x41 0x11 0x00 0x00 0x00 0x61
This indicates that the ldr
instruction is converting that memory read into little endian format where LSB holds the first byte in memory. Is the understanding correct? But this still does not answer why this did not happen for a .word
.
CodePudding user response:
Endianess or byte order is the order in which the bytes comprising a number are represented in memory.
A string is an array of bytes. Each byte of this string is subject to endianess, but for a single byte, little and big endian come out to the same thing.
For your second question: endianess only affects data while being stored in memory. The assembler gives you a human readable representation of the computer program. The token 0x12345678
represents a certain number. When transferred to memory, this token will be written to memory in the appropriate byte order. The assembler takes care of this.
You will also see the register content as 0x12345678
when watching the execution of your program in a debugger. This is because registers are not part of memory and are not divided into bytes. Each register holds a 32 bit number. The CPU transfers data between registers and memory in the configured byte order (see the SETEND
instruction) And without the register being divided into bytes, there is no meaningful way to assign a byte order to it. The debugger can only show you its numeric value. And this just comes out to be the value you assigned to it in your program. Crazy how this works, eh?
CodePudding user response:
.ascii is a string of bytes .word is a list of 32 bit items not 8 bit items, they are incomparable. You wanted .byte perhaps?
.ascii "Hello"
.align
.word 0x12345678
.byte 0x12,0x34,0x56,0x78
assemble and disassemble
00000000 <.text>:
0: 6c6c6548 cfstr64vs mvdx6, [ip], #-288 ; 0xfffffee0
4: 0000006f andeq r0, r0, pc, rrx
8: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
c: 78563412 ldmdavc r6, {r1, r4, sl, ip, sp}^
link, copy to binary and dump
00000000 48 65 6c 6c 6f 00 00 00 78 56 34 12 12 34 56 78 |Hello...xV4..4Vx|
00000010
No surprises here everything is as expected so far. The ascii string is a string of bytes, we see those in order as we declared them. The word is a word, this is a little endian target, 0x12345678, 0x78 is the least significant byte so it goes first at the lowest address. To compare against .ascii apples to apples we need a string of bytes, so 0x12 was declared first just like 'H' was declared first so we see it first in memory.
ldr r0,label0
ldr r1,label1
.ascii "Hello"
.align
label0:
.word 0x12345678
label1:
.byte 0x12,0x34,0x56,0x78
assemble and disassemble
00000000 <label0-0x10>:
0: e59f0008 ldr r0, [pc, #8] ; 10 <label0>
4: e59f1008 ldr r1, [pc, #8] ; 14 <label1>
8: 6c6c6548 cfstr64vs mvdx6, [ip], #-288 ; 0xfffffee0
c: 0000006f andeq r0, r0, pc, rrx
00000010 <label0>:
10: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
00000014 <label1>:
14: 78563412 ldmdavc r6, {r1, r4, sl, ip, sp}^
Again no surprise. The DISASSEMBLER has tried to turn these bytes into instructions and has shown them as words, so we see the 0x12345678 and 0x78563412 respectively and those are the values that would land in r0 and r1
Link and copy to binary and hexdump -C
00000000 08 00 9f e5 08 10 9f e5 48 65 6c 6c 6f 00 00 00 |........Hello...|
00000010 78 56 34 12 12 34 56 78 |xV4..4Vx|
0
And we did not change anything so the output does not change with respect to the data items.