Home > other >  ARM endianness and byte ordering for .ascii vs .word
ARM endianness and byte ordering for .ascii vs .word

Time:12-26

I just started learning ARM assembly. I am currently on a 32-bit Raspian with "GNU assembler version 2.35.2 (arm-linux-gnueabihf)".

This is my simple program to load part of ascii into a register :

.global _start
_start:
    ldr r1,=helloworld
    ldr r2,[r1]

    @prepare to exit
    mov r0,#0
    mov r7,#1
    svc 0

.data
helloworld:
    .ascii "HelloWorld"

I loaded it into gdb and can see that my register r2 loads 0x6c6c6548 (in ascii "lleH"). A quick objdump shows :

Contents of section .data:
 0000 48656c6c 6f576f72 6c64               HelloWorld

I have below questions :

  1. How does the string look like in memory? In other words, when the endianness come into picture? Will reversal happen while loading into memory? Or the string will be loaded as is into memory but gets reversed while loading into register?
  2. Why the content of register r2 for below program with .word is 0x12345678 instead of 0x78563412 ? Why there is no endianess followed?

Note : .word used instead of .ascii

.global _start
_start:
    ldr r1,=helloworld
    ldr r2,[r1]
    mov r0,#0
    mov r7,#1
    svc 0

.data
helloworld:
    .word 0x12345678

EDIT

The memory dump for first program shows that even the memory has string in same order as in the source code and the object file :

>>> x/32xb 0x1008c
0x1008c:    0x48    0x65    0x6c    0x6c    0x6f    0x57    0x6f    0x72
0x10094:    0x6c    0x64    0x41    0x11    0x00    0x00    0x00    0x61

This indicates that the ldr instruction is converting that memory read into little endian format where LSB holds the first byte in memory. Is the understanding correct? But this still does not answer why this did not happen for a .word.

CodePudding user response:

Endianess or byte order is the order in which the bytes comprising a number are represented in memory.

A string is an array of bytes. Each byte of this string is subject to endianess, but for a single byte, little and big endian come out to the same thing.

For your second question: endianess only affects data while being stored in memory. The assembler gives you a human readable representation of the computer program. The token 0x12345678 represents a certain number. When transferred to memory, this token will be written to memory in the appropriate byte order. The assembler takes care of this.

You will also see the register content as 0x12345678 when watching the execution of your program in a debugger. This is because registers are not part of memory and are not divided into bytes. Each register holds a 32 bit number. The CPU transfers data between registers and memory in the configured byte order (see the SETEND instruction) And without the register being divided into bytes, there is no meaningful way to assign a byte order to it. The debugger can only show you its numeric value. And this just comes out to be the value you assigned to it in your program. Crazy how this works, eh?

CodePudding user response:

.ascii is a string of bytes .word is a list of 32 bit items not 8 bit items, they are incomparable. You wanted .byte perhaps?

.ascii "Hello"
.align
.word 0x12345678
.byte 0x12,0x34,0x56,0x78

assemble and disassemble

00000000 <.text>:
   0:   6c6c6548    cfstr64vs   mvdx6, [ip], #-288  ; 0xfffffee0
   4:   0000006f    andeq   r0, r0, pc, rrx
   8:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000
   c:   78563412    ldmdavc r6, {r1, r4, sl, ip, sp}^

link, copy to binary and dump

00000000  48 65 6c 6c 6f 00 00 00  78 56 34 12 12 34 56 78 |Hello...xV4..4Vx|
00000010

No surprises here everything is as expected so far. The ascii string is a string of bytes, we see those in order as we declared them. The word is a word, this is a little endian target, 0x12345678, 0x78 is the least significant byte so it goes first at the lowest address. To compare against .ascii apples to apples we need a string of bytes, so 0x12 was declared first just like 'H' was declared first so we see it first in memory.

ldr r0,label0
ldr r1,label1

.ascii "Hello"
.align
label0:
.word 0x12345678
label1:
.byte 0x12,0x34,0x56,0x78

assemble and disassemble

00000000 <label0-0x10>:
   0:   e59f0008    ldr r0, [pc, #8]    ; 10 <label0>
   4:   e59f1008    ldr r1, [pc, #8]    ; 14 <label1>
   8:   6c6c6548    cfstr64vs   mvdx6, [ip], #-288  ; 0xfffffee0
   c:   0000006f    andeq   r0, r0, pc, rrx

00000010 <label0>:
  10:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

00000014 <label1>:
  14:   78563412    ldmdavc r6, {r1, r4, sl, ip, sp}^

Again no surprise. The DISASSEMBLER has tried to turn these bytes into instructions and has shown them as words, so we see the 0x12345678 and 0x78563412 respectively and those are the values that would land in r0 and r1

Link and copy to binary and hexdump -C

00000000  08 00 9f e5 08 10 9f e5  48 65 6c 6c 6f 00 00 00  |........Hello...|
00000010  78 56 34 12 12 34 56 78                           |xV4..4Vx|
0

And we did not change anything so the output does not change with respect to the data items.

  • Related