Slice a dataframe based on the first particular value shows up-CodePudding

I want to do a slice a dataframe in pandas like this:

index  a    b
0      A    -
1      A     
2      A    -
3      B    -
4      C     
5      C    -

I want to keep all the rows after the first ' ', grouped by column A, and delete all the rows in each group starting with '-'. The outcome should be like this:

index  a    b
1      A     
2      A    -
4      C     
5      C    -

How to do this?

CodePudding user response：

Use GroupBy.cummax with compare b for for keep all rows after first per groups:

df1 = (df[df.assign(new = lambda x: x['b'].eq(' '))
       .groupby('a')['new']
       .cummax()])

print (df1)
   a  b
1  A   
2  A  -
4  C   
5  C  -

CodePudding user response：

Simple syntax, use groupby on Series with cummax:

df[df['b'].eq(' ').groupby(df['a']).cummax()]

output:

   index  a  b
1      1  A   
2      2  A  -
4      4  C   
5      5  C  -

If you also want to delete groups that start with - ("delete all the rows in each group starting with '-'"), you can combine cummin/cummax:

df[df['b'].ne('-').groupby(df['a']).apply(lambda s: s.cummin().cummax())]

output:

   index  a  b
4      4  C   
5      5  C  -