How to copy a 2D array (matrix) from python with a C function (and do some computer heavy computatio-CodePudding

I want to copy a 2D numpy array (matrix) in a C function a get it back in python (and then do some calculation on it in C taking the speed advantage of C) . Therefore I need the C function matrix_copy to return a 2D array (or, I guess, a pointer to it). I tried with the following code but I get the following output (where one can see the second dimension of the array is lost).

matrix_in.shape:
(300, 200)
matrix_out.shape:
(300,)

How could I change the code (I guess the matrix_copy.c adding some pointer magic) so I could obtain an exact copy of the matrix_in in matrix_out?

Here is the main.py script:

from ctypes import c_void_p, c_double, c_int, cdll
from numpy.ctypeslib import ndpointer
import numpy as np
import pdb

n = 300
m = 200
matrix_in = np.random.randn(n, m)

lib = cdll.LoadLibrary("matrix_copy.so")
matrix_copy = lib.matrix_copy
matrix_copy.restype = ndpointer(dtype=c_double,
                          shape=(n,))

matrix_out = matrix_copy(c_void_p(matrix_in.ctypes.data),
            c_int(n),
            c_int(m))

print("matrix_in.shape:")
print(matrix_in.shape)
print("matrix_out.shape:")
print(matrix_out.shape)

Here is the matrix_copy.c script:

#include <stdlib.h>
#include <stdio.h>

double * matrix_copy(const double * matrix_in, int n, int m){
    double * matrix_out = (double *)malloc(sizeof(double) * (n*m));
    int index = 0;
    for(int i=0; i< n; i  ){
        for(int j=0; j<m; j  ){
            matrix_out[i j] = matrix_in[i j];
            //matrix_out[i][j] = matrix_in[i][j];
            // some heavy computations not yet implemented
        }
    }
    return matrix_out;
}

which I compile with the command

cc -fPIC -shared -o matrix_copy.so matrix_copy.c

And as a side note, why does the notation matrix_out[i][j] = matrix_in[i][j]; throws me an error on compilation?

matrix_copy.c:10:26: error: subscripted value is not an array, pointer, or vector
            matrix_out[i][j] = matrix_in[i][j];
            ~~~~~~~~~~~~~^~
matrix_copy.c:10:44: error: subscripted value is not an array, pointer, or vector
            matrix_out[i][j] = matrix_in[i][j];

CodePudding user response：

The second dimension is 'lost' because you explicitly omit it in the named shape argument of ndpointer. Change:

matrix_copy.restype = ndpointer(dtype=c_double, shape=(n,))

matrix_copy.restype = ndpointer(dtype=c_double, shape=(n,m), flags='C')

Where flags='C' additionally notes that the returned data is stored contiguously in row major order.

With regards to matrix_out[i][j] = matrix_in[i][j]; throwing an error, consider that matrix_in is of type const double *. matrix_in[i] would yield a value of type const double - how do you further index this value (i.e., with [j])?

If you want to emulate accessing a 2-dimensional array via a single pointer, you must calculate offsets manually. matrix_out[i j] is not sufficient, as you must account for the span of each sub array:

matrix_out[i * m   j] = matrix_in[i * m   j];

Note that in C, size_t is the generally preferred type to use when dealing with memory sizes or array lengths.

matrix_copy.c, simplified:

#include <stdlib.h>

double *matrix_copy(const double *matrix_in, size_t n, size_t m)
{
    double *matrix_out = malloc(sizeof *matrix_out * n * m);

    for (size_t i = 0; i < n; i  )
        for (size_t j = 0; j < m; j  )
            matrix_out[i * m   j] = matrix_in[i * m   j];

    return matrix_out;
}

matrix.py, with more explicit typing:

from ctypes import c_void_p, c_double, c_size_t, cdll, POINTER
from numpy.ctypeslib import ndpointer
import numpy as np

c_double_p = POINTER(c_double)

n = 300
m = 200
matrix_in = np.random.randn(n, m).astype(c_double)

lib = cdll.LoadLibrary("matrix_copy.so")
matrix_copy = lib.matrix_copy
matrix_copy.argtypes = c_double_p, c_size_t, c_size_t
matrix_copy.restype = ndpointer(
        dtype=c_double,
        shape=(n,m),
        flags='C')

matrix_out = matrix_copy(
        matrix_in.ctypes.data_as(c_double_p),
        c_size_t(n),
        c_size_t(m))

print("matrix_in.shape:", matrix_in.shape)
print("matrix_out.shape:", matrix_out.shape)
print("in == out", matrix_in == matrix_out)

CodePudding user response：

The incoming data is a probably single block of memory. You need to create the substructure.

In my C code I have to do the following on data (block) coming in via swig:

void divide2DDoubleArray(double * &block, double ** &subblockdividers, int noofsubblocks, int subblocksize){
    /* The starting address of a block of doubles is used to generate
     * pointers to subblocks.
     *
     *      block: memory containing the original block of data
     *      subblockdividers: array of subblock addresses
     *      noofsubblocks: specify the number of subblocks produced
     *      subblocksize: specify the size of the subblocks produced
     *
     * Design by contract: application should make sure the memory
     * in block is allocated and initialized properly.
     */
    // Build 2D matrix for cols
    subblockdividers=new double *[noofsubblocks];
    subblockdividers[0]= block;
    for (int i=1; i<noofsubblocks;   i) {
        subblockdividers[i] = &subblockdividers[i-1][subblocksize];
    }
}

Now the pointer returned in subblockdividers can be used the way you would like to.

Don't forget to free subblockdividers when your done. (Note: adjustments might be needed to compile this as C code)