Home > database >  Cython numpy array view off by one when wraparound is False
Cython numpy array view off by one when wraparound is False

Time:12-17

I have some Cython code where I fill in the last value in each row of a memory view of a NumPy array with a number. If I compile the code with wraparound = False, the last value in the final row of the array does not get filled in. However, if I set wraparound = True it does get filled in as expected. As far as I can tell, in my code having wraparound as either True or False should make no difference, but it obviously does. Does anyone know why?

A simplified version of the code that demonstrates this and can be run in a Jupyter notebook is below:

%load_ext cython

Setting wraparound = False

%%cython

# cython: boundscheck = False
# cython: wraparound = False

import numpy as np
cimport numpy as np

def set_array(np.float32_t[:, :] buffer):
    cdef Py_ssize_t i = 0
    cdef Py_ssize_t n = buffer.shape[0]

    for i in range(n):
        # fill final value in row i with a 1
        buffer[i, -1] = 1.0
        
    print(np.asarray(buffer))

This gives:

import numpy as np

x = np.zeros((3, 4), dtype=np.float32)

set_array(x)
[[0. 0. 0. 1.]
 [0. 0. 0. 1.]
 [0. 0. 0. 0.]]

where the final row does not have a 1 at the end.

If I instead use:

# cython: wraparound = True

the final output is:

[[0. 0. 0. 1.]
 [0. 0. 0. 1.]
 [0. 0. 0. 1.]]

which is what is expected.

CodePudding user response:

To quote the Cython docs,

wraparound (True / False)

In Python, arrays and sequences can be indexed relative to the end. For example, A[-1] indexes the last value of a list. In C, negative indexing is not supported. If set to False, Cython is allowed to neither check for nor correctly handle negative indices, possibly causing segfaults or data corruption.

When you explicitly disable wraparound you promise not to use negative indices because they will not work the way you expect. You are entering the area of undefined behaviour, so all bets are off.

What's happening in your example is that the index buffer[i, j] corresponds to a linear index of something like buffer.flat[i*buffer.shape[1] j]. When you use j = -1 you end up assigning to the last value of the previous row.

There's also undefined behaviour: for i = 0 (the first iteration) you are setting a value in the point in memory that precedes your buffer (buffer[0, -1] translates to a linear index of -1, relative to where your buffer starts)! If this address happened to be outside what belongs to your process, you'd get a segfault. But it's much more likely that the memory belongs to your process, corrupting the memory.

If you assign i instead of 1.0 in the loop you should see the indices being shifted in the column, starting from 1 and ending in n - 1.

  • Related