Assembly routine to add matrices not giving expected output-CodePudding

I am writing an assembly routine that adds two 2D matrices. The code is written in MASM syntax.

; --------------------
; matrixAdd.asm
; --------------------

; Adds two matrices

; extern "C" int matrixAdd(int *matrixOnePtr, int *matrixTwoPtr, long long rows, long long cols)

    .code
matrixAdd proc frame
    ; function prologue
    push r12                                                    ; push r12 register for use
    .pushreg r12
    push r13                                                    ; push r13 register for use
    .pushreg r13
    push r14                                                    ; push r14 register for use
    .pushreg r14
    .endprolog
    
    ; validate number of rows and columns
    cmp r8, 0                                                   ; compare rows 
    jle InvalidArg                                              ; jump to InvalidArg if rows <= 0
                                        
    cmp r9, 0                                                   ; compare cols
    jle InvalidArg                                              ; jump to InvalidArg if cols <= 0

    xor r10, r10                                                ; set r10 (increment variable) to 0

LoopOne:
    xor r11, r11                                                ; set r11 (inner increment variable) to 0

LoopTwo:
    mov r12, rcx                                                ; move first pointer to r12
    mov r13, rdx                                                ; move second pointer to r13

    ; calculate offset
    mov r14, r10                                                ; move r10 (outer increment variable) to r14
    imul r14, r9                                                ; multiply r14 by number of columns
    add r14, r11                                                ; add inner increment variable
    imul r14, 4                                                 ; multiply offset by 4

    add r12, r14                                                ; add offset to first pointer
    add r13, r14                                                ; add offset to second pointer

    mov r14, [r13]                                              ; move value of second pointer to r14
    add [r12], r14                                              ; add two matrices

    inc r11                                                     ; increment inner increment variable
    cmp r11, r9                                                 ; compare inner increment variable to columns
    jle LoopTwo                                                 ; jump if r11 <= cols

    ; LoopOne; check if outer increment variable is <= rows
    inc r10                                                     ; increment outer increment variable
    cmp r10, r8                                                 ; compare r10 to rows
    jle LoopOne                                                 ; jump if r10 <= rows

    pop r12                                                     ; restore r12 register
    pop r13                                                     ; restore r13 register
    pop r14                                                     ; restore r14 register
    xor eax, eax                                                ; set successful return code
    ret                                                         ; return 0

; Rows or columns is invalid
InvalidArg:
    pop r12                                                     ; restore r12 register
    pop r13                                                     ; restore r13 register
    pop r14                                                     ; restore r14 register
    mov eax, 1                                                  ; move 1 into eax register
    ret                                                         ; return 1
matrixAdd endp  
    end

The main function is as follows:

#include <stdio.h>

extern int matrixAdd(int *matrixOnePtr, int *matrixTwoPtr, long long rows, long long cols);

int main(int argc, char **argv) {
    int a1[3][3] = {
        {1, 2, 3},
        {1, 2, 3},
        {1, 2, 3}
    };

    int a2[3][3] = {
        {1, 2, 3},
        {1, 2, 3},
        {1, 2, 3}
    };

    matrixAdd(a1, a2, 3, 3);

    for (int i = 0; i < 3;   i) {
        for (int j = 0; j < 3;   j) {
            printf("%d ", a1[i][j]);
        }

        printf("\n");
    }
}

The expected output should be

2 4 6
2 4 6
2 4 6

But the result is

2 6 9
4 8 9
4 8 9

I have been trying to debug the code but it is not working. My computer is Windows 10 64-bit. Each of the matrices are 3x3. I am new to x86 assembly programming.

CodePudding user response：

Your (assembly language) algorithm is fundamentally inferior. Apparently, you were merely translating two nested loops (as it is customary and correct in HLLs). You are forgetting the fact, though, that int[6][7] is, with respect to memory, equivalent to an int[42]. Considering you want to perform matrix addition, a series of rows * cols sums is more efficient.
Also, it will simplify address calculating. For (mere) address calculation, by the way, you usually use the lea instruction. Don’t worry, as a novice it can be an intimidatingly complex instruction, for me it was too.
You are loading the “first/second pointer to r12/r13” within the loop. This is wrong (or at least unnecessary since you aren’t overwriting rcx/rdx). You should do this exactly once, i. e. outside the loop(s).
As Jester already commented: Because of zero-based indexing of arrays the abort conditions should not include =. For instance, in the case of an int[3] you want to iterate over the set {0, 1, 2}.
Also as Jester already commented, and this is the main problem: mov r14, [r13] and add [r12], r14 will retrieve and operate on a 64‑bit quantity. The size of the destination register in a mov is important. mov r14d, [r13] will retrieve a “double word”, i. e. a 32‑bit quantity. Likewise, add dword ptr [r12], r14d will add two 32‑bit values. Note that the memory address (on a 64‑bit system) always needs to be specified as a full 64‑bit register. The dword ptr prefix will inform MASM about the memory operand size so it’s assembled differentlty.
The result in a1[0][0] is actually correct, because the x86 architecture is a little Endian system.
Your pop instructions at the (two possible) end(s) of the routine must be in the reverse push order. Your code is actually swapping the contents of r12 and r14 in the call site context. Ideally you would merge both function epilogues, but tackle one problem at a time.
Lastly, your function is returning a (non-constant) int value, but your C code is not checking on that. Declaring your function void is probably the easiest solution.