I'm trying to improve my assembly programming, and I've come across this exercise for deriving the value of the parameters in this function, but I am unsure how I should go about doing it using the assembly code given.
This is the assembly code which I am confused about (tried commenting some lines):
arrayfunc:
leaq 15992(%rdx),%rax // get 1999th element frm Array2
leaq -8(%rdx),%r10 //start of Array2
movq %rcx,%r9 // store address of Array1 in rcx into r9
.L2:
leaq -400(%rdx), %r8 //Array2 - 50longs? but why minus 50longs
movq %r9,%rdx //move address in Array1[i][j] into rdx
.L3: //inner loop
movslq (%rdx),%rcx //move value in Array1[i][j] into rcx
subq $8,%rax // increment j so becomes Array2[M-1-i][N-1-2j]
addq $4,%rdx //increment address to Array1[i][2j]
movq %rcx,8(%rax)// what does this line do
cmpq %r8,%rax //compare j<N
jne .L3
addq $200,%r9 //Not sure what this line does with the 200
cmpq %r10,%rax
jne .L2
ret
And this is the C code given:
void arrayfunc(int Array1[M][N], long Array2[M][N])
{
long i,j;
for(i=0;i<M; i)
for(j=0;j<N; j)
{
Array2[M-1-i][N-1-j] = Array1[i][j];
}
}
Could someone teach me how to intepret the asm correctly so that I can derive the value of M and N accurately? I'm having difficulty with interpreting the lines (not sure if I commented correctly but there are some lines I am really not sure what's happening as indicated)
Please help me understand this asm better (commenting the code would help immensely) as I really don't know how to go about finding the M and N values.
Any and all help is appreciated.
CodePudding user response:
This is made harder by the fact that there are some mistakes in these codes. The third leaq
only has one operand so is missing a target register. M
and N
are constants or else there would be code involving variables (and maybe multiplication) for the indexing (there isn't any), but the C code says M
, which would not be allowed on a constant (this should be i instead).
Since M
& N
are constants, so the element at Array2[M-1][N-1]
is a constant offset from Array2
(referring to the very last element of the array). Since that is used in a loop, the code computes that address in what is called loop invariant code motion — an optimization technique that relocates some fixed/constant computation out of the loop, to do it beforehand instead of repeating the same thing each iteration of the loop.
From the Array2[M-1]
part, we derive (M-1)*N
to get to the offset for that last row. From the [N-1]
part, we add to that N-1
, and then multiply the whole thing by 8, since 8 bytes per long in Array2
.
The full offset for that constant part of the indexing is computed then by the formula ((M-1)*N N-1)*8
, and, simplified (M*N-1)*8
and M*N*8-8
. Thus 15992 = M*N*8-8
and 16000 = M*N*8
and 2000 = M*N
.
The outer loop is stepping forward by 200
bytes per iteration, which corresponds to an incrementing i
, which is used in the first index position for Array1
. Since 1
of the first index of Array1
maps to 200
bytes, the size of a row of Array1
(in elements not bytes) is 200/4
or 50
, therefore N=50
.
Since N=50
we can reason that 2000=M*50
and thus, 2000/50=40=M
.
Basically, one approach is to search the code to figure out how it computes Array2[M-1-i][N-1-j]
. This is key b/c it is an expression in the assembly code that uses M
.
(Array1[i][j]
might have involved N
, but not M
— but here it has been optimized, the author/compiler recognizing the access pattern is sequential, so that i*N j
is not needed, just a running value with an increment of 4 instead).
It is not trivial, as optimization techniques have been applied; these spread that computation out across different parts of the code rather than appearing together in one place as one perhaps might expect. Variables have also been eliminated (or heavily modified), substituting indexes and loop control variables for pointers.
This line: movq %rcx,8(%rax)// what does this line do
does the assignment write to memory of Array2
, basically the =
operator in Array2[][]=...
. Once that is appreciated, we can reason backward to find the entire indexing computation, which has parts spread out and various constants combined.
(Another approach is to figure out how i<M
and j<N
are done, though since these loop control variables have been changed in favor of pointers, analysis is non-trivial, and include some of the above analysis.)
The loop has one read and one write, both in C and in assembly. Therefore, the memory write must be the assignment to an element of Array2
, and the memory read movslq (%rdx),%rcx
must be to fetch an element from Array1
.
Let's note that further optimization could change things considerably, for example, loop unrolling, and use of vector registers.