Home > Mobile >  numpy filter 2D array with 2D pandas mask
numpy filter 2D array with 2D pandas mask

Time:02-23

Given a 2D numpy array I would like to filter values based on a condition.

data = pd.DataFrame([['A', 'B'], ['C']])
mask = ~pd.isna(data)
filtered_data = data.values[mask.values]
> ['A', 'B', 'C']   // expcted: [['A', 'B'], ['C']

I already explored any solutions on SO where you filter using np.isnan but that doesn't work when all your data types are not numbers. I have a mix of string and NaN.

How can I get a 2D array where all the NaN values are stripped? I would prefer a vectorized a solution rather than looping over each dimension in numpy.

CodePudding user response:

I think list comprehension would be the fastest:

vals = data.to_numpy()
filtered_data = [list(v[m]) for v, m in zip(vals, pd.notna(vals))]

Alternative pandas based approach with stack and groupby:

filtered_data = data.stack().groupby(level=0).agg(list).tolist()

Result

print(filtered_data)

[['A', 'B'], ['C']]
  • Related