Numpy array indexing: view or copy - depends on scope?-CodePudding

Consider the following array manipulations:

import numpy as np
def f(x):
     x  = 1
x = np.zeros(1)
f(x)       # changes `x`
f(x[0])    # doesn't change `x`
x[0]  = 1  # changes `x`

Why does x[0] behave differently depending on whether = 1 happens inside or outside the function f?

Can I pass a part of the array to the function, such that the function modifies the original array?

CodePudding user response：

You don't even need the function call to see this difference.

x is an array:

In [138]: type(x)
Out[138]: numpy.ndarray

Indexing an element of the array returns a np.float64 object. It in effect "takes" the value out of the array; it is not a reference to the element of the array.

In [140]: y=x[0]
In [141]: type(y)
Out[141]: numpy.float64

This y is a lot like a python float; you can = the same way:

In [142]: y  = 1
In [143]: y
Out[143]: 1.0

but this does not change x:

In [144]: x
Out[144]: array([0.])

But this does change x:

In [145]: x[0]  = 1
In [146]: x
Out[146]: array([1.])

y=x[0] does a x.__getitem__ call. x[0]=3 does a x.__setitem__ call. = uses __iadd__, but it's similar in effect.

Another example:

Changing x:

In [149]: x[0] = 3
In [150]: x
Out[150]: array([3.])

but attempting to do the same to y fails:

In [151]: y[()] = 3
Traceback (most recent call last):
  File "<ipython-input-151-153d89268cbc>", line 1, in <module>
    y[()] = 3
TypeError: 'numpy.float64' object does not support item assignment

but y[()] is allowed.

basic indexing of an array with a slice does produce a view that can be modified:

In [154]: x = np.zeros(5)
In [155]: x
Out[155]: array([0., 0., 0., 0., 0.])
In [156]: y= x[0:2]
In [157]: type(y)
Out[157]: numpy.ndarray
In [158]: y  = 1
In [159]: y
Out[159]: array([1., 1.])
In [160]: x
Out[160]: array([1., 1., 0., 0., 0.])

CodePudding user response：

The issue is not scope, since the only thing that depends on scope is the available names. All objects can be accessed in any scope that has a name for them. The issue is one of mutability vs immutability and understanding what operators do.

x is a mutable numpy array. f runs x = 1 directly on it. = is the operator that invokes in-place addition. In other words, it does x = x.__iadd__(1)^*. Notice the reassignment to x, which happens in the function. That is a feature of the in-place operators that allows them to operate on immutable objects. In this case, ndarray.__iadd__ is a true in-place operator which just returns x, and everything works as expected.

Now let's analyze f(x[0]) the same way. x[0] calls x.__getitem__(0)^*. When you pass in a scalar int index, numpy extracts a one-element array and effectively calls .item() on it. The result is a python int (or float, or even possibly a tuple, depending on what your array's dtype is). Either way, the object is immutable. Once it's been extracted by __getitem__, the = operator in f replaces the name x in f with the new object, but the change is not seen outside the function, much less in the array. In this scenario, f has no reference to x, so no change is to be expected.

The example of x[0] = 1 is not the same as calling f(x[0]). It is equivalent to calling x.__setitem__(0, x.__getitem__(0).__iadd__(1))^*. The call to f was only the part with type(x).__getitem__(0).__iadd__(1), which returns a new object, but never reassigns as __setitem__ does. The key is that [] = (__setitem__) in python is an entirely different operator from [] (__getitem__) and = (assingment) separately.

To make the second example (f(x[0]) work, you would have to pass in a mutable object. An integer object extracts a single python object, and an array index makes a copy. However, a slice index returns a view that is mutable and tied to the original array memory. Therefore, you can do

f(x[0:1])  # changes `x`

In this case f does the following: x.__getitem__(slice(0, 1, None)).__iadd__(1). The key is that __getitem__ returns a mutable view into the original array, not an immutable int.

To see why it is important not only that the object is mutable but that it is a view into the original array, try f(x[[0]]). Indexing with a list produces an array, but a copy. In x[[0]].__iadd__ will modify the list you pass in in-place, but the list is not copied back into the original, so the change will not propagate.

^* This is an approximation. When invoked by an operator, dunder methods are actually called as type(x).__operator__(x, ...), not x.__operator__(...).