I have a series of observations (status
) at different timepoints (dt
) for a number of subjects (id
) in a MultiIndex DataFrame:
dt status
id n
01 1 xx abc
2 xx xxx
02 1 xx xxx
2 xx abc
3 xx xxx
4 xx xxx
03 1 xx xxx
04 1 xx xxx
2 xx xxx
3 xx abc
I want to drop every observation before status=='abc'
to make a new MultiIndex DataFrame like this:
dt status
id n
01 1 xx abc
2 xx xxx
02 2 xx abc
3 xx xxx
4 xx xxx
04 3 xx abc
What is the most efficient way to do that?
CodePudding user response:
Create a boolean mask to identify the rows where status
is abc
, then group the mask by id
and use cummax
to propagate the True
values in forward direction then use the resulting mask to filter the rows
df[df['status'].eq('abc').groupby('id').cummax()]
dt status
id n
01 1 xx abc
2 xx xxx
02 2 xx abc
3 xx xxx
4 xx xxx
04 3 xx abc