I am writing an assembly routine that adds two 2D matrices. The code is written in MASM syntax.
; --------------------
; matrixAdd.asm
; --------------------
; Adds two matrices
; extern "C" int matrixAdd(int *matrixOnePtr, int *matrixTwoPtr, long long rows, long long cols)
.code
matrixAdd proc frame
; function prologue
push r12 ; push r12 register for use
.pushreg r12
push r13 ; push r13 register for use
.pushreg r13
push r14 ; push r14 register for use
.pushreg r14
.endprolog
; validate number of rows and columns
cmp r8, 0 ; compare rows
jle InvalidArg ; jump to InvalidArg if rows <= 0
cmp r9, 0 ; compare cols
jle InvalidArg ; jump to InvalidArg if cols <= 0
xor r10, r10 ; set r10 (increment variable) to 0
LoopOne:
xor r11, r11 ; set r11 (inner increment variable) to 0
LoopTwo:
mov r12, rcx ; move first pointer to r12
mov r13, rdx ; move second pointer to r13
; calculate offset
mov r14, r10 ; move r10 (outer increment variable) to r14
imul r14, r9 ; multiply r14 by number of columns
add r14, r11 ; add inner increment variable
imul r14, 4 ; multiply offset by 4
add r12, r14 ; add offset to first pointer
add r13, r14 ; add offset to second pointer
mov r14, [r13] ; move value of second pointer to r14
add [r12], r14 ; add two matrices
inc r11 ; increment inner increment variable
cmp r11, r9 ; compare inner increment variable to columns
jle LoopTwo ; jump if r11 <= cols
; LoopOne; check if outer increment variable is <= rows
inc r10 ; increment outer increment variable
cmp r10, r8 ; compare r10 to rows
jle LoopOne ; jump if r10 <= rows
pop r12 ; restore r12 register
pop r13 ; restore r13 register
pop r14 ; restore r14 register
xor eax, eax ; set successful return code
ret ; return 0
; Rows or columns is invalid
InvalidArg:
pop r12 ; restore r12 register
pop r13 ; restore r13 register
pop r14 ; restore r14 register
mov eax, 1 ; move 1 into eax register
ret ; return 1
matrixAdd endp
end
The main function is as follows:
#include <stdio.h>
extern int matrixAdd(int *matrixOnePtr, int *matrixTwoPtr, long long rows, long long cols);
int main(int argc, char **argv) {
int a1[3][3] = {
{1, 2, 3},
{1, 2, 3},
{1, 2, 3}
};
int a2[3][3] = {
{1, 2, 3},
{1, 2, 3},
{1, 2, 3}
};
matrixAdd(a1, a2, 3, 3);
for (int i = 0; i < 3; i) {
for (int j = 0; j < 3; j) {
printf("%d ", a1[i][j]);
}
printf("\n");
}
}
The expected output should be
2 4 6
2 4 6
2 4 6
But the result is
2 6 9
4 8 9
4 8 9
I have been trying to debug the code but it is not working. My computer is Windows 10 64-bit. Each of the matrices are 3x3. I am new to x86 assembly programming.
CodePudding user response:
- Your (assembly language) algorithm is fundamentally inferior.
Apparently, you were merely translating two nested loops (as it is customary and correct in HLLs).
You are forgetting the fact, though, that
int[6][7]
is, with respect to memory, equivalent to anint[42]
. Considering you want to perform matrix addition, a series ofrows * cols
sums is more efficient. - Also, it will simplify address calculating.
For (mere) address calculation, by the way, you usually use the
lea
instruction. Don’t worry, as a novice it can be an intimidatingly complex instruction, for me it was too. - You are loading the “first/second pointer to r12/r13” within the loop.
This is wrong (or at least unnecessary since you aren’t overwriting
rcx
/rdx
). You should do this exactly once, i. e. outside the loop(s). - As Jester already commented:
Because of zero-based indexing of arrays the abort conditions should not include
=
. For instance, in the case of anint[3]
you want to iterate over the set{0, 1, 2}
. - Also as Jester already commented, and this is the main problem:
mov r14, [r13]
andadd [r12], r14
will retrieve and operate on a 64‑bit quantity. The size of the destination register in amov
is important.mov r14d, [r13]
will retrieve a “double word”, i. e. a 32‑bit quantity. Likewise,add dword ptr [r12], r14d
will add two 32‑bit values. Note that the memory address (on a 64‑bit system) always needs to be specified as a full 64‑bit register. Thedword ptr
prefix will inform MASM about the memory operand size so it’s assembled differentlty. - The result in
a1[0][0]
is actually correct, because the x86 architecture is a little Endian system. - Your
pop
instructions at the (two possible) end(s) of the routine must be in the reversepush
order. Your code is actually swapping the contents ofr12
andr14
in the call site context. Ideally you would merge both function epilogues, but tackle one problem at a time. - Lastly, your function is returning a (non-constant)
int
value, but your C code is not checking on that. Declaring your functionvoid
is probably the easiest solution.