I have a numpy
matrix and am trying to find the sum of each row. There are some nan
entries which I want to treat as 0
. For instance, I would expect the sum of
[1, nan, 30]
[nan, 2, nan]
[8, nan, nan]
[nan, 1, 16]
to be
[31]
[2]
[8]
[17]
EDIT: each row contains entries which are either nan
or a sympy
expr
.
So the sum of the row [x, nan, 30x]
should be 31x
However, these nan
values are not np.NaN
, but are created in calculations with SymPy
and therefore are S.NaN
.
np.isnan(S.NaN)
returns False
. I have tried using np.nansum
but because of this, it just returns nan
.
A = np.matrix([[1, np.nan, 30],
[np.nan, 2, np.nan],
[8, np.nan, np.nan],
[np.nan, 1, 16]])
B = np.matrix([[1, S.NaN, 30],
[S.NaN, 2, S.NaN],
[8, S.NaN, S.NaN],
[S.NaN, 1, 16]])
np.nansum(A, 1)
returns
matrix([[31.],
[ 2.],
[ 8.],
[17.]])
as expected.
But np.nansum(B, 1)
returns
matrix([[nan],
[nan],
[nan],
[nan]])
I also thought that I might be able to replace the S.NaN
values with 0
. There are quite a lot of answered questions about how to convert np.NaN
values to 0
, e.g. convert nan value to zero, but these solutions don't work because I don't think there's an equivalent in SymPy for np.isnan()
.
Is there either an equivalent in SymPy for np.nansum
which I could use, or a way to replace S.NaN
values with 0
(or np.NaN
)?
CodePudding user response:
I played with sympy.Matrix
, but turns out it isn't needed.
A numpy array with sympy.nan
elements. Note the object dtype:
In [65]: arr
Out[65]:
array([[1, nan, 30],
[nan, 2, nan],
[8, nan, nan],
[nan, 1, 16]], dtype=object)
In [66]: type(arr[0,1])
Out[66]: sympy.core.numbers.NaN
It converts to a float array without problem:
In [67]: arr.astype(float)
Out[67]:
array([[ 1., nan, 30.],
[nan, 2., nan],
[ 8., nan, nan],
[nan, 1., 16.]])
In [68]: type(_[0,1])
Out[68]: numpy.float64
In [69]: np.nansum(__,1)
Out[69]: array([31., 2., 8., 17.])
sympy.nan
converts to python float without problem:
In [71]: type(nan)
Out[71]: sympy.core.numbers.NaN
In [72]: type(float(nan))
Out[72]: float
sympy.Matrix
If I make a sympy.Matrix
from the array:
In [83]: M= Matrix(arr)
In [84]: M
Out[84]:
⎡ 1 nan 30 ⎤
⎢ ⎥
⎢nan 2 nan⎥
⎢ ⎥
⎢ 8 nan nan⎥
⎢ ⎥
⎣nan 1 16 ⎦
In [85]: M[0,1]
Out[85]:
nan
In [86]: type(M[0,1])
Out[86]: sympy.core.numbers.NaN
I can use subs
to replace the nan
:
In [87]: M1 = M.subs({nan: 0})
In [88]: M1
Out[88]:
⎡1 0 30⎤
⎢ ⎥
⎢0 2 0 ⎥
⎢ ⎥
⎢8 0 0 ⎥
⎢ ⎥
⎣0 1 16⎦
np.sum
works on this:
In [89]: np.sum(M1,1)
Out[89]: array([31, 2, 8, 17], dtype=object)
Presumably np.sum
first turns the Matrix
in to an array:
In [90]: np.array(M1)
Out[90]:
array([[1, 0, 30],
[0, 2, 0],
[8, 0, 0],
[0, 1, 16]], dtype=object)
==
We don't need isnan
to test for sp.nan
. While its float
value is np.nan
(or python nan
), it's a distinct sympy
object, and ==
works just fine (that's shown in the sympy.nan
docs).
In [92]: arr==sp.nan
Out[92]:
array([[False, True, False],
[ True, False, True],
[False, True, True],
[ True, False, False]])
In [93]: arr1 = np.where(arr==sp.nan,0,arr)
In [94]: arr1
Out[94]:
array([[1, 0, 30],
[0, 2, 0],
[8, 0, 0],
[0, 1, 16]], dtype=object)
or if you insist on using nansum
:
In [95]: np.nansum(np.where(arr==sp.nan,np.nan,arr),1)
Out[95]: array([31, 2, 8, 17], dtype=object)
CodePudding user response:
I found the function np.nditer
which allows you to iterate through each entry in an array. Using the fact that S.NaN
raises a TypeError
when a comparison is made, the following code correctly replaces any entries of it with a 0
(or anything else)
In[122]: B
Out[122]:
matrix([[1, nan, 30],
[nan, 2, nan],
[8, nan, nan],
[nan, 1, 16]], dtype=object)
In[123]: for x in np.nditer(B, flags=["refs_ok"], op_flags = ['readwrite']):
try:
x > 0
except TypeError:
x[...] = 0
In[124]: B
Out[124]:
matrix([[1, 0, 30],
[0, 2, 0],
[8, 0, 0],
[0, 1, 16]], dtype=object)