Home > OS >  Pandas dropna from a subset based on condition
Pandas dropna from a subset based on condition

Time:08-16

What is the most efficient way to dropna for a subset of records based on a condition?

Here's some example data:

import numpy as np
d = {'T': [1,1,1,1,2,2,2,2], 'ID': [156, 156, 156, 156, 157, 157, 157, 157],'PMN': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],'XN': [1, np.nan, 5, np.nan, np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data=d)

I would like to dropna within column XN when T is 1. I understand how to dropna from a subset, but the condition of T==1 is what's throwing me.

df.dropna(axis=0, subset=['XN'], inplace=True)

Here's the desired output:

out = {'T': [1,1,2,2,2,2], 'ID': [156, 156, 157, 157, 157, 157],'PMN': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],'XN': [1, 5, np.nan, np.nan, np.nan, np.nan]}
out_df = pd.DataFrame(data=out)
out_df

Thanks!

CodePudding user response:

Create a boolean condition, and filter the dataframe with it :

condition = df['T'].eq(1) & df.XN.isna()
df.loc[~condition]

   T   ID  PMN   XN
0  1  156  NaN  1.0
2  1  156  NaN  5.0
4  2  157  NaN  NaN
5  2  157  NaN  NaN
6  2  157  NaN  NaN
7  2  157  NaN  NaN

CodePudding user response:

It's probably easier not to use dropna, and just do this by selecting the rows you want:

out_df = df[(df['T'] != 1) | (~df['XN'].isna())]

  • Related