What is the most efficient way to dropna for a subset of records based on a condition?
Here's some example data:
import numpy as np
d = {'T': [1,1,1,1,2,2,2,2], 'ID': [156, 156, 156, 156, 157, 157, 157, 157],'PMN': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],'XN': [1, np.nan, 5, np.nan, np.nan, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data=d)
I would like to dropna within column XN when T is 1. I understand how to dropna from a subset, but the condition of T==1 is what's throwing me.
df.dropna(axis=0, subset=['XN'], inplace=True)
Here's the desired output:
out = {'T': [1,1,2,2,2,2], 'ID': [156, 156, 157, 157, 157, 157],'PMN': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],'XN': [1, 5, np.nan, np.nan, np.nan, np.nan]}
out_df = pd.DataFrame(data=out)
out_df
Thanks!
CodePudding user response:
Create a boolean condition, and filter the dataframe with it :
condition = df['T'].eq(1) & df.XN.isna()
df.loc[~condition]
T ID PMN XN
0 1 156 NaN 1.0
2 1 156 NaN 5.0
4 2 157 NaN NaN
5 2 157 NaN NaN
6 2 157 NaN NaN
7 2 157 NaN NaN
CodePudding user response:
It's probably easier not to use dropna
, and just do this by selecting the rows you want:
out_df = df[(df['T'] != 1) | (~df['XN'].isna())]