Home > database >  Appending an empty list to a numpy array changes its dtype
Appending an empty list to a numpy array changes its dtype

Time:06-10

I have a numpy array of integers. In my code I need to append some other integers from a list, which works fine and gives me back an array of dtype int64 as expected. But it may happen that the list of integers to append is empty. In that case, numpy returns an array of float64 values. Exemplary code below:

import numpy as np

a = np.arange(10, dtype='int64')

np.append(a, [10])  # dtype is int64
# array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

np.append(a, [])    # dtype is float64
# array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

Is this expected behaviour? If so, what is the rationale behind this? Could this even be a bug?

The documentation for np.append states that the return value is

A copy of arr with values appended to axis.

Since there are no values to append, shouldn't it just return a copy of the array?

(Numpy version: 1.22.4, Python version: 3.8.0)

CodePudding user response:

You can cast your array when appending and specify the type you want:

import numpy as np

a = np.arange(10, dtype="int64")
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)

np.append(a, [])
array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float64)

np.append(a, np.array([], dtype=a.dtype))
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)

CodePudding user response:

In the source code for numpy.append, we can see that it calls numpy.concatenate. Looking at the documentation for this method, we see that it accepts two type-related arguments: dtype and casting, the default of which is None. Since numpy.append does not provide values for either of these, the defaults are assumed. The default for casting is same-kind, which allows for safe casting within a kind, as described here. Since all of this is done in C, my guess is that concatenate tried to guess the type of the array you provided. Since it was empty, however, no type could be assumed so it took the widest possible type that would fit both inputs and assigned that to the output.

If you want to avoid this, you should probably append using numpy arrays instead of Python lists:

np.append(a, np.array([], dtype = 'int64'))

CodePudding user response:

The default dtype for a numpy array constructed from an empty list is float64:

>>> np.array([])
array([], dtype=float64)

A float64 dtype will "win" over the int64 dtype and promote everything to float64, because converting the other way around would cause loss of precision (i.e. truncate any float64). Of course this is an extreme case because there is no value to truncate in an empty array, but the check is only done on the dtype. More info on this is in the doc for numpy.result_type().

In fact:

>>> a = np.array(dtype='int64')
>>> a
array([], dtype=int64)

>>> b = np.array([])
>>> b
array([], dtype=float64)

>>> np.result_type(a, b)
dtype('float64')

The np.promote_types() function is used to determine the type to promote to:

>>> np.promote_types('int64', 'float64')
dtype('float64')

See also: How the dtype of numpy array is calculated internally?

CodePudding user response:

Can you avoid appending the the empty list with some sort of if statement?

I think its expected behaviour as [] is still a value that gets appended. Obviously [] can't be an integer so it probably assumes its a float (default casting).

  • Related