I was wondering wether, given a boolean mask, there is a way to retreive all the elements of a DataFrame positioned in correspondance of the True values in the mask.
In my case I have a DataFrame containing the values of a certain dataset, for example let's take the following :
l = [[5, 3, 1],
[0, 3, 1],
[7, 3, 0],
[8, 5, 23],
[40, 4, 30],
[2, 6, 13]]
df_true = pd.DataFrame(l, columns=['1', '2', '3'])
df_true
Then I randomly replace some of the values with 'np.nan' as follows:
l2 = [[5, 3, np.nan],
[np.nan, 3, 1],
[7, np.nan, 0],
[np.nan, 5, 23],
[40, 4, np.nan],
[2, np.nan, 13]]
df_nan= pd.DataFrame(l2, columns=['1', '2', '3'])
df_nan
Let's say that after applying some imputation algorithm I obtained as a result:
l3 = [[5, 3, 1],
[2, 3, 1],
[7, 8, 0],
[8, 5, 23],
[40, 4, 25],
[2, 6, 13]]
df_imp= pd.DataFrame(l3, columns=['1', '2', '3'])
df_imp
Now I would like to create two lists (or arrays), one containing the imputed values and the other one the true values in order to compare them. To do so I first created a mask m = df_nan.isnull()
which has value True in correspondance of the cells containing the imputed values. By applying the mask as df_imp[m]
I obtain:
1 2 3
0 NaN NaN 1.0
1 2.0 NaN NaN
2 NaN 8.0 NaN
3 8.0 NaN NaN
4 NaN NaN 25.0
5 NaN 6.0 NaN
Is there a way to get instead only the values without also the Nan, and put them into a list?
CodePudding user response:
You can use df.values
to return a numpy representation of the DataFrame then use numpy.isnan
and keep other values.
import numpy as np
arr = df.values
res = arr[~np.isnan(arr)]
print(res)
# [1. 2. 8. 8. 25. 6.]