I have been self-learning numpy, and according to the numpy manual, the numpy.sum will sum all the elements of an array or array-like. However, I have noticed if these arrays are in different lengths, numpy.sum would rather combine them than sum them.
For example:
array_a = [1,2,3,4,5,6] # Same length
array_b = [4,5,6,7,8,9]
np.sum([array_a, array_b])
60
array_a = [1,2,3,4,5] # Different length
array_b = [4,5,6,7,8,9]
np.sum([array_a, array_b])
[1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
Could anyone please help me to explain why in the latter, numpy.sum did not sum up all the elements as it is supposed to do? Thank you so much.
CodePudding user response:
In [128]: array_a = [1,2,3,4,5,6] # Same length
...: array_b = [4,5,6,7,8,9]
Here you give sum
a list:
In [129]: np.sum([array_a, array_b])
Out[129]: 60
What it does first is make array:
In [130]: np.array([array_a, array_b])
Out[130]:
array([[1, 2, 3, 4, 5, 6],
[4, 5, 6, 7, 8, 9]])
60 is the sum of all elements. You can also give sum
an axis number:
In [131]: np.sum([array_a, array_b],axis=0)
Out[131]: array([ 5, 7, 9, 11, 13, 15])
In [132]: np.sum([array_a, array_b],axis=1)
Out[132]: array([21, 39])
That's the normal, documented behavior.
ragged
In [133]: array_a = [1,2,3,4,5] # Different length
...: array_b = [4,5,6,7,8,9]
In [135]: x = np.array([array_a, array_b])
<ipython-input-135-5379fc40e73f>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
x = np.array([array_a, array_b])
In [136]: x.shape
Out[136]: (2,)
In [137]: x.dtype
Out[137]: dtype('O')
In [138]: np.sum(x)
Out[138]: [1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
That is summing the lists - same as if we do:
In [139]: array_a array_b
Out[139]: [1, 2, 3, 4, 5, 4, 5, 6, 7, 8, 9]
Despite the name, array_a
is NOT an array.
With object dtype, numpy
tries to apply the operator (here add) to the elements. Add for a list is concatenate.
If instead we make a ragged array from arrays:
In [140]: y = np.array([np.array(array_a), np.array(array_b)])
...
In [142]: y
Out[142]: array([array([1, 2, 3, 4, 5]), array([4, 5, 6, 7, 8, 9])], dtype=object)
In [143]: np.sum(y)
Traceback ...
ValueError: operands could not be broadcast together with shapes (5,) (6,)
It's trying to do
In [144]: np.array(array_a) np.array(array_b)
When learning numpy
it's a good idea to focus on the numeric multidimensional arrays, and leave these ragged
object dtype arrays to later. There are nuances that aren't obvious from the "normal" array operations. Ragged arrays are very much like lists, and often are the result of user errors. Intentionally making ragged arrays is usually not a useful approach.
CodePudding user response:
Could anyone please help me to explain why in the latter,
numpy.sum
did not sum up all the elements as it is supposed to do?
I cannot explain why it didn't - because it did.
[array_a, array_b]
contains two elements - one of them is the list array_a
, and the other is the list array_b
. The result in the latter case is the same as for array_a array_b
.
In the first case, Numpy detects that it can build a two-dimensional, two-by-six Numpy array from the input data. In the latter case, it cannot build a two-dimensional array - arrays, by their definition, must be rectangular (which is part of why we do not use that name for the built-in Python data structure, but instead call it a list
). It can, however, build a one-dimensional array, where each element is a Python list. (Yes, Numpy arrays are allowed to store Python objects; the dtype
will be object
, and Numpy will not care about the exact type. In the case of numpy.sum
, it will just create the appropriate requests to "add" the elements, and the rest is the responsibility of built-in Python definitions.)
Explicit is better than implicit
. If you want to work with arrays, then create them in the first place:
array_a = np.array([1,2,3,4,5,6])