Segmentation Vault at LDR X10, [X9, #0]-CodePudding

I have tried to write an ARM-LEGv8 assembler program that calculates the average of two values in an array at a certain position. It runs on a Raspberry Pi with Armbian.

The pseudo code should look like this:

int average(int v[], int i){
  int average = (v[i]   v[i-1])/2;
  return average;
}

The array is at X0 and the i on X1.

My assembly code looks like this:

.globl _start

.data
myArray: .word 0, 1, 2, 3, 35, 5

.text


average:
  LSL X9, X1, #2
  ADD X9, X9, X0
  LDR X10, [X9, #0] // guess Segmentation Vault 
  LDUR X11, [X9, #-4]
  ADD X3, X10, X11  
  LSR X3, X3, #1 
  BR X30

_start:
  LDR X0, myArray
  MOV X1, #5
  BL mittelwert


  MOV X8, #0x5d
  MOV X0, X3
  SVC 0

I used these commands to build it:

as average.s -o average.o

gcc average.o -o average -nostdlib -static

When I run my program I get a Segmentation Vault. Why?

CodePudding user response：

I don't know arm64 assembly specifically, but I guess it is the same as ARM.

If that is the case then you do not need to pre-multiply your array index by the size of the element. Because you are using the LDR instruction it already knows that you are looking at an array of word-sized items.

This being the case, you are multiplying index 5 by size 4, and getting index 20 (not byte 20), but there are only 6 elements in the array.

CodePudding user response：

(Disclaimer: the following is based on the actual ARMv8-A instruction set. I'm not sure what changes LEGv8 may have made.)

LDR X0, myArray doesn't load X0 with the address of the label myArray. It loads a doubleword from that address (ARM calls this the "literal" form of the load instruction). So after this instruction, X0 contains 0x0000000100000000 which naturally results in an invalid pointer by the time you do LDR X10, [X9, #0].

You may have meant LDR X0, =myArray which will place a pointer to myArray into the literal pool, then assemble a literal load of that pointer from its address in the pool. That would work, assuming your system can handle that type of relocation. However, for modern position-independent executables used by common operating systems, the preferred method is

ADR X0, myArray
ADD X0, X0, #:lo12:myArray

The first instruction populates the high 52 bits of X0 with those bits of the address of myArray, using an offset from PC. The second adds in the low 12 bits. See also Understanding ARM relocation (example: str x0, [tmp, #:lo12:zbi_paddr])

A couple other bugs and remarks:

Your LDR X10, [X9, #0] and LDUR X11, [X9, #-4] are 64-bit loads, because you used an X register as the destination. But the elements of myArray were defined with .word, 32 bits. So the high 32 bits of each register will contain garbage, or they may crash if the loads extend beyond the end of the array into an unmapped page. To be consistent with 32-bit elements, load them into W registers LDR W10, [X9, #0] and LDUR W11, [X9, #-4], and then do your arithmetic on the W registers instead.
You are thinking of your array elements as type int, which is signed, but your code currently would not correctly handle negative values (hint: what is the L in LSR?). Think about how to fix this, or change it to unsigned.
Likewise, i is declared in C as int, but you access X1 as a 64-bit register. If you call this function from C, the ARM64 ABI allows the high bits of X1 to be garbage. You probably want to declare it as size_t or unsigned long instead. If you do keep it as a 32-bit type, most likely unsigned is what you want, and then you need to zero-extend W1 into X1 before using it.
When returning from a function, prefer RET to BR X30 as the former is better optimized for this purpose. (Though maybe LEGv8 doesn't have RET?)