default values for numpy ndarray-CodePudding

I was working with numpy.ndarray and something interesting happened. I created an array with the shape of (2, 2) and left everything else with the default values. It created an array for me with these values:

array([[2.12199579e-314, 0.00000000e 000],
       [5.35567160e-321, 7.72406468e-312]])

I created another array with the same default values and it also gave me the same result.

Then I created a new array (using the default values and the shape (2, 2)) and filled it with zeros using the 'fill' method. The interesting part is that now whenever I create a new array with ndarray it gives me an array with 0 values. So what is going on behind the scenes?

CodePudding user response：

See https://numpy.org/doc/stable/reference/generated/numpy.empty.html#numpy.empty: (Precisely as @Michael Butscher commented)

np.empty([2, 2]) creates an array without touching the contents of the memory chunk allocated for the array; thus, the array may look as if filled with some more or less random values.

np.ndarray([2, 2]) does the same.

Other creation methods, however, fill the memory with some values:

np.zeros([2, 2]) fills the memory with zeros, np.full([2, 2], 9) fills the memory with nines, etc.

Now, if you create a new array via np.empty() after creating (and disposing of, i.e. automatically garbage collected) an array filled with e.g. ones, your new array may be allocated the same chunk of memory and thus look as if "filled" with ones.

CodePudding user response：

np.empty explicitly says it returns:

Array of uninitialized (arbitrary) data of the given shape, dtype, and
    order.  Object arrays will be initialized to None.

It's compiled code so I can't say for sure, but I strongly suspect is just calls np.ndarray, with shape and dtype.

ndarray describes itself as a low level function, and lists many, better alternatives.

In a ipython session I can make two arrays:

In [2]: arr = np.empty((2,2), dtype='int32'); arr
Out[2]: 
array([[  927000399,  1267404612],
       [ 1828571807, -1590157072]])

In [3]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[3]: 
array([[  927000399,  1267404612],
       [ 1828571807, -1590157072]])

The values are the same, but when I check the "location" of their data buffers, I see that they are different:

In [4]: arr.__array_interface__['data'][0]
Out[4]: 2213385069328
In [5]: arr1.__array_interface__['data'][0]
Out[5]: 2213385068176

We can't use that number in code to fiddle with the values, but it's useful as a human-readable indicator of where the data is stored. (Do you understand the basics of how arrays are stored, with shape, dtype, strides, and data-buffer?)

Why the "uninitialized values" are the same is anyones guess; my guess it's just an artifact of the how that bit of memory was used before. np.empty stresses that we shouldn't place an significance to those values.

Doing the ndarray again, produces different values and location:

In [9]: arr1 = np.ndarray((2,2), dtype='int32'); arr1
Out[9]: 
array([[1469865440,        515],
       [         0,          0]])
In [10]: arr1.__array_interface__['data'][0]
Out[10]: 2213403372816

apparent reuse

If I don't assign the array to a variable, or otherwise "hang on to it", numpy may reuse the data buffer memory:

In [17]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[17]: 2213403374512
In [18]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[18]: 2213403374512
In [19]: np.ndarray((2,2), dtype='int').__array_interface__['data'][0]
Out[19]: 2213403374512
In [20]: np.empty((2,2), dtype='int').__array_interface__['data'][0]
Out[20]: 2213403374512

Again, we shouldn't place too much significance to this reuse, and certainly not count on it for any calculations.

object dtype

If we specify the object dtype, then the values are initialized to None. This dtype contains references/pointers to objects in memory, and "random" pointers wouldn't be safe.

In [14]: arr1 = np.ndarray((2,2), dtype='object'); arr1
Out[14]: 
array([[None, None],
       [None, None]], dtype=object)

In [15]: arr1 = np.ndarray((2,2), dtype='U3'); arr1
Out[15]: 
array([['', ''],
       ['', '']], dtype='<U3')