Given a 2D numpy array I would like to filter values based on a condition.
data = pd.DataFrame([['A', 'B'], ['C']])
mask = ~pd.isna(data)
filtered_data = data.values[mask.values]
> ['A', 'B', 'C'] // expcted: [['A', 'B'], ['C']
I already explored any solutions on SO where you filter using np.isnan
but that doesn't work when all your data types are not numbers. I have a mix of string and NaN.
How can I get a 2D array where all the NaN values are stripped? I would prefer a vectorized a solution rather than looping over each dimension in numpy.
CodePudding user response:
I think list comprehension would be the fastest:
vals = data.to_numpy()
filtered_data = [list(v[m]) for v, m in zip(vals, pd.notna(vals))]
Alternative pandas based approach with stack
and groupby
:
filtered_data = data.stack().groupby(level=0).agg(list).tolist()
Result
print(filtered_data)
[['A', 'B'], ['C']]