I need to use the dsyev
function from LAPACK
on a large real symmetric 2D matrix. I can use it from scipy.linalg.lapack
, but it doesn't finish computing in reasonable time.
I can benefit from > 20 cores on a HPC cluster which has intel 2020
suite as a module. Because reasons, I cannot just use the same python code as the one from my laptop inside on the cluster and set the environmental variable MKL_NUM_THREADS=32
for example. My only way forward is using C and Intel MKL for C.
I studied the intel MKL example on how to use the dsyev
function in C.
My 2D array in python is shape (20160, 20160)
of np.float64
. 20160 = 252 * 80
. I just write T = np.zeros((20160, 20160), dtype=np.float64)
, then proceed to filling it according to my task. At the end I reshape it using np.flatten(order='C')
, then save it to disk in either a .npy
or a .txt
format.
Here I ask for your help:
I am trying to read it from memory using C.
My elementary program works for small 1D array sizes, example (20000,)
of C double
s, but fails when actually trying to read the whole 1D array, to be then used with dsyev
.
According to google: 250 * 80 * 250 * (64 bytes) = 0.32 gigabytes
and I do not know why I cannot use in my C program a 1D array of that size, for example.
My code in C:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main() {
double arrayy[252*80*250];
// double *arrayy = (double*) malloc(250*80*100*sizeof(double));
printf("Hehe");
FILE *fp;
fp = fopen("1D_array.txt", "rb");
for ( int i = 0; i < 250*80 ; i ) {
// fread(&(array[i]), sizeof(array[i]), 1, fp);
fscanf(fp, "%lf", &arrayy[i]);
}
// fread(&(arrayy[0]), sizeof(double), 252*80*3, fp);
fclose(fp);
for ( int i = 0 ; i < 252*80 ; i ) {
if (i % 1000 == 0)
printf("Value at index %d is %f\n", i, arrayy[i]);
}
}
I am using gcc 9.3.0.17
inside a Windows Subsytem for Linux 1 (WSL1) on a Windows 11 with 16 GB of RAM.
Thank you!
UPDATE:
This, when compiled with $ gcc -Wall test.c
and then $./a.out
returns $ Segmentation fault (core dumped)
and nothing else gets printed!
CodePudding user response:
In C, arrays are allocated on the stack just like any other local variable, which is a relatively small area in memory compared to the heap.
This line double arrayy[252*80*250];
probably leads to a stack overflow which appears as a segmentation fault. To keep in line with your Python code, you should dynamically allocate your array, like this:
double *arrayy = malloc(250*80*100*sizeof(*arrayy));
Note that you don't need the cast and that you can do sizeof(*arrayy)
instead of sizeof(double)
.
Also note that in your code, you seem to alternate between 252 and 250 for no reason. You should probably just name the size of your array into a variable, like this:
const unsigned long arrayy_size = 252 * 80 * 252 * 80;
double *arrayy = malloc(arrayy_size * sizeof(*arrayy));
// [...]
for (unsigned long i = 0; i < arrayy_size; i) {
// use arrayy[i]
}