I have tried to write an ARM-LEGv8 assembler program that calculates the average of two values in an array at a certain position. It runs on a Raspberry Pi with Armbian.
The pseudo code should look like this:
int average(int v[], int i){
int average = (v[i] v[i-1])/2;
return average;
}
The array is at X0 and the i on X1.
My assembly code looks like this:
.globl _start
.data
myArray: .word 0, 1, 2, 3, 35, 5
.text
average:
LSL X9, X1, #2
ADD X9, X9, X0
LDR X10, [X9, #0] // guess Segmentation Vault
LDUR X11, [X9, #-4]
ADD X3, X10, X11
LSR X3, X3, #1
BR X30
_start:
LDR X0, myArray
MOV X1, #5
BL mittelwert
MOV X8, #0x5d
MOV X0, X3
SVC 0
I used these commands to build it:
as average.s -o average.o
gcc average.o -o average -nostdlib -static
When I run my program I get a Segmentation Vault. Why?
CodePudding user response:
I don't know arm64 assembly specifically, but I guess it is the same as ARM.
If that is the case then you do not need to pre-multiply your array index by the size of the element. Because you are using the LDR instruction it already knows that you are looking at an array of word-sized items.
This being the case, you are multiplying index 5 by size 4, and getting index 20 (not byte 20), but there are only 6 elements in the array.
CodePudding user response:
(Disclaimer: the following is based on the actual ARMv8-A instruction set. I'm not sure what changes LEGv8 may have made.)
LDR X0, myArray
doesn't load X0
with the address of the label myArray
. It loads a doubleword from that address (ARM calls this the "literal" form of the load instruction). So after this instruction, X0
contains 0x0000000100000000
which naturally results in an invalid pointer by the time you do LDR X10, [X9, #0]
.
You may have meant LDR X0, =myArray
which will place a pointer to myArray
into the literal pool, then assemble a literal load of that pointer from its address in the pool. That would work, assuming your system can handle that type of relocation. However, for modern position-independent executables used by common operating systems, the preferred method is
ADR X0, myArray
ADD X0, X0, #:lo12:myArray
The first instruction populates the high 52 bits of X0
with those bits of the address of myArray
, using an offset from PC. The second adds in the low 12 bits. See also Understanding ARM relocation (example: str x0, [tmp, #:lo12:zbi_paddr])
A couple other bugs and remarks:
Your
LDR X10, [X9, #0]
andLDUR X11, [X9, #-4]
are 64-bit loads, because you used an X register as the destination. But the elements ofmyArray
were defined with.word
, 32 bits. So the high 32 bits of each register will contain garbage, or they may crash if the loads extend beyond the end of the array into an unmapped page. To be consistent with 32-bit elements, load them into W registersLDR W10, [X9, #0]
andLDUR W11, [X9, #-4]
, and then do your arithmetic on the W registers instead.You are thinking of your array elements as type
int
, which is signed, but your code currently would not correctly handle negative values (hint: what is theL
inLSR
?). Think about how to fix this, or change it tounsigned
.Likewise,
i
is declared in C asint
, but you accessX1
as a 64-bit register. If you call this function from C, the ARM64 ABI allows the high bits of X1 to be garbage. You probably want to declare it assize_t
orunsigned long
instead. If you do keep it as a 32-bit type, most likelyunsigned
is what you want, and then you need to zero-extendW1
intoX1
before using it.When returning from a function, prefer
RET
toBR X30
as the former is better optimized for this purpose. (Though maybe LEGv8 doesn't haveRET
?)