Cython initialize matrix with zeros-CodePudding

Description

I simply want to create a matrix of rows x cols that is filled with 0s. Always working with numpy I thought using np.zeros as described in the docs is the easiest:

DTYPE = np.int
ctypedef np.int_t DTYPE_t

def f1():
    cdef:
        int dim = 40000
        int i, j
        np.ndarray[DTYPE_t, ndim=2] mat = np.zeros([40000, 40000], dtype=DTYPE)
        
    for i in range(dim):
        for j in range(dim):
            mat[i, j] = 1

Then I compared this using the arrays in c:

def f2():
    cdef:
        int dim = 40000
        int[40000][40000] mat
        int i, j
    
    for i in range(dim):
        for j in range(dim):
            mat[i][j] = 1

The numpy version took 3 secs on my pc whereas the c version only took2.4e-5 secs. However when I return the array from f2() I noticed it is not zero filled (of course here it can't be, i==j however when not filling it it won't return a 0 array either). How can this be done in cython. I know in regular C it would be like: int arr[n][m] = {};.

Question

How can the c array be filled with 0s? (I would go for numpy instead if there is something obvious wrong in my code)

CodePudding user response：

You do not want to be writing code like this:

int[40000][40000] mat generates a 6 gigabyte array on the stack (assuming 4 byte ints). Typically maximum stack sizes are of the order of a few Mb. I have no idea how this isn't crashing your PC.
However when I return the array from f2() [...]

The array you have allocated is completely local to the function. From a C point of view you cannot return it since it ceases to exist after the function has finished. I think Cython may convert it to a (nested) Python list for you. This requires a slow copy element-by-element and is not what you want.

For what you're doing here you're much better just using Numpy.

Cython doesn't support a good equivalent of the C arr = {} so if you do want initialize sensible, small C arrays you need to use of one:

loops,
memset (which you can cimport from libc.string),
Create a typed memoryview of it and do memview[:,:] = 0

The numpy version took 3 secs on my pc whereas the c version only took2.4e-5 secs.

This kind of difference usually suggests that the C compiler has optimized some code out (by detecting that the result is unused). It is unlikely to be a genuine speed-up.