Home > OS >  np.zeros versus array allocation in C misunderstanding
np.zeros versus array allocation in C misunderstanding

Time:10-15

I need to use the dsyev function from LAPACK on a large real symmetric 2D matrix. I can use it from scipy.linalg.lapack, but it doesn't finish computing in reasonable time.

I can benefit from > 20 cores on a HPC cluster which has intel 2020 suite as a module. Because reasons, I cannot just use the same python code as the one from my laptop inside on the cluster and set the environmental variable MKL_NUM_THREADS=32 for example. My only way forward is using C and Intel MKL for C.

I studied the intel MKL example on how to use the dsyev function in C.

My 2D array in python is shape (20160, 20160) of np.float64. 20160 = 252 * 80. I just write T = np.zeros((20160, 20160), dtype=np.float64), then proceed to filling it according to my task. At the end I reshape it using np.flatten(order='C'), then save it to disk in either a .npy or a .txt format.

Here I ask for your help:

I am trying to read it from memory using C. My elementary program works for small 1D array sizes, example (20000,) of C doubles, but fails when actually trying to read the whole 1D array, to be then used with dsyev. According to google: 250 * 80 * 250 * (64 bytes) = 0.32 gigabytes and I do not know why I cannot use in my C program a 1D array of that size, for example.

My code in C:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main() {
    double arrayy[252*80*250];
    // double *arrayy = (double*) malloc(250*80*100*sizeof(double));
    printf("Hehe");
    FILE *fp;

    fp = fopen("1D_array.txt", "rb");
    for ( int i = 0; i < 250*80 ; i  ) {
        // fread(&(array[i]), sizeof(array[i]), 1, fp);
        fscanf(fp, "%lf", &arrayy[i]);
    }
    // fread(&(arrayy[0]), sizeof(double), 252*80*3, fp);
    fclose(fp);

    for ( int i = 0 ; i < 252*80 ; i   ) {
        if (i % 1000 == 0)
            printf("Value at index %d is %f\n", i, arrayy[i]);
  }
}

I am using gcc 9.3.0.17 inside a Windows Subsytem for Linux 1 (WSL1) on a Windows 11 with 16 GB of RAM.

Thank you!

UPDATE:

This, when compiled with $ gcc -Wall test.c and then $./a.out returns $ Segmentation fault (core dumped) and nothing else gets printed!

CodePudding user response:

In C, arrays are allocated on the stack just like any other local variable, which is a relatively small area in memory compared to the heap.

This line double arrayy[252*80*250]; probably leads to a stack overflow which appears as a segmentation fault. To keep in line with your Python code, you should dynamically allocate your array, like this:

double *arrayy = malloc(250*80*100*sizeof(*arrayy));

Note that you don't need the cast and that you can do sizeof(*arrayy) instead of sizeof(double).

Also note that in your code, you seem to alternate between 252 and 250 for no reason. You should probably just name the size of your array into a variable, like this:

const unsigned long arrayy_size = 252 * 80 * 252 * 80;
double *arrayy = malloc(arrayy_size * sizeof(*arrayy));
// [...]
for (unsigned long i = 0; i < arrayy_size;   i) {
    // use arrayy[i]
}
  • Related