Home > Software engineering >  Drop rows before a specific observation in a MultiIndex DataFrame
Drop rows before a specific observation in a MultiIndex DataFrame

Time:07-09

I have a series of observations (status) at different timepoints (dt) for a number of subjects (id) in a MultiIndex DataFrame:

       dt  status
id  n
01  1  xx     abc
    2  xx     xxx
02  1  xx     xxx
    2  xx     abc
    3  xx     xxx
    4  xx     xxx
03  1  xx     xxx
04  1  xx     xxx
    2  xx     xxx
    3  xx     abc

I want to drop every observation before status=='abc' to make a new MultiIndex DataFrame like this:

       dt  status
id  n
01  1  xx     abc
    2  xx     xxx
02  2  xx     abc
    3  xx     xxx
    4  xx     xxx
04  3  xx     abc

What is the most efficient way to do that?

CodePudding user response:

Create a boolean mask to identify the rows where status is abc, then group the mask by id and use cummax to propagate the True values in forward direction then use the resulting mask to filter the rows

df[df['status'].eq('abc').groupby('id').cummax()]

      dt status
id n           
01 1  xx    abc
   2  xx    xxx
02 2  xx    abc
   3  xx    xxx
   4  xx    xxx
04 3  xx    abc
  • Related