I have the following df
, I want to groupby "group" and drop not only rows that include NaN
, but all the preceding values.
group date value
1 jan2019 NaN
1 jan2019 3
1 jan2019 NaN
1 feb2019 3
1 mar2019 4
1 mar2019 5
2 feb2019 0
2 feb2019 NaN
2 mar2019 7
2 mar2019 4
2 apr2019 5
desired df
group date value
1 feb2019 3
1 mar2019 4
1 mar2019 5
2 mar2019 7
2 mar2019 4
2 apr2019 5
CodePudding user response:
You can flag the inf values using abs
eq
. Then to flag all the values before it as well, you can reverse the order of the Series and use cummax
. Since you want to do this job across groups, you can use groupby.cummax
. Finally, use the boolean mask to filter the desired output via loc
:
out = df.loc[~df['value'].abs().eq(float('inf'))[::-1].groupby(df['group']).cummax()]
If the values to flag are NaNs (rather than inf), then we could use isna
instead:
out = df.loc[~df['value'].isna()[::-1].groupby(df['group']).cummax()]
Output:
group date value
2 1 feb2019 3.0
3 1 mar2019 4.0
4 1 mar2019 5.0
7 2 mar2019 7.0
8 2 mar2019 4.0
9 2 apr2019 5.0
CodePudding user response:
A more verbose but possibly more readable approach. .isin
will work for NaN as well as infs.
def filter_preceding(df, search_values=[np.nan, np.inf, -np.inf]):
ind = np.where(df['values'].isin(search_values))[0]
if len(ind) == 0:
return df
max_ind = ind.max()
return df.iloc[max_ind 1:, :]
df.groupby('group').apply(filter_preceding).reset_index(drop=True)