I am new to C, before I learned Python, that's why I don't know what stride is and how to use them in code.
/* assignment 4 part B */
#include <stdlib.h> // for rand and srand
#include <stdio.h> // for printf
#include <time.h> // for time
// PLEASE DON'T CHANGE THESE DEFINES!
#define ROWS 2000
#define COLS 2000
long addition(short table[][COLS]) {
long sum = 0;
for (long i = 0; i < COLS; i )
for (long j = 0; j < ROWS; j ) {
sum = table[j][i];
}
return sum;
}
/*
void printTable(short table[][COLS]) {
for (int i = 0; i < COLS; i ) {
for (int j = 0; j < ROWS; j )
printf("%d ", table[j][i]);
printf("\n");
}
}
*/
// PLEASE DON'T CHANGE THIS FUNCTION!
void createTable(short table[][COLS]) {
for (int i = 0; i < COLS; i ) {
for (int j = 0; j < ROWS; j ) {
short number = (rand() % 10) 1;
table[j][i] = number;
}
}
}
int main() {
short table[ROWS][COLS];
srand((unsigned int)time(0));
createTable(table);
//printTable(table);
long sum = 0;
for (int i = 0; i < 1000; i ) {
sum = addition(table);
}
printf("sum is %ld\n", sum);
return 0;
}
There are some instructions:
● Compile in GCC without errors.
● Program runs without run-time errors
● Change none of the code where comments indicate not to change.
● The for
loops run the same number of times for all runs.
● The sum is accurate
If I am running this code then this takes more than 1 min to execute, My teacher says don't change anything except the addition function. They say.
Also, look at the nested loops in addition function and write down the stride for accessing this array
MY MAIN GOAL IS HOW TO EXECUTE THIS CODE FASTER THAN THE FIRST ONE>
CodePudding user response:
Generally, stride is the distance steps take through something.
In the addition
routine, we have these loops:
for (long i = 0; i < COLS; i )
for (long j = 0; j < ROWS; j ) {
sum = table[j][i];
}
In successive iterations of the innermost loop with j
equal to x
in the first iteration, one iteration accesses table[x][i]
, and the next accesses table[x 1][i]
. The distance between these two accesses is the size of one table[j]
, which is COLS
(2000) elements of short
(likely two bytes), so likely 4000 bytes. So the stride is 4000 bytes.
This is generally bad for the cache memory on typical processors, as cache memory is designed mostly for memory accesses that are close to each other (small strides). This is the cause of the program’s slow performance.
Since the operation in the loop, sum = table[j][i];
, is independent of the order it is executed in for all the i
and j
, we can easily remedy this problem by swapping the two for
statements:
for (long j = 0; j < ROWS; j )
for (long i = 0; i < COLS; i )
sum = table[j][i];
Then successive iterations of the innermost loop will access table[j][x]
and table[j][x 1]
, which have a stride of one short
, likely two bytes.
On my system, the program runs about twenty times faster with this change.