Home > Software engineering >  Remove last row per group based on a count of occurrence
Remove last row per group based on a count of occurrence

Time:06-23

I have a Dataframe,

df:

ID    col1    col2     
A      11       0       
A      14       0      
B      15       0       
B      95       1       
B      81       2       
c      0        1       
c      9        1       

I want to drop the last row of group of 'ID' column if the count of that group is greater than 3.

Required output:

ID    col1    col2     
A      11       0       
A      14       0      
B      15       0       
B      95       1       
c      0        1       
c      9        1 

what I am trying:

df.groupby('ID').apply(lambda x: x.iloc[:-1] if len(x)>3 else x).reset_index(drop=True)

CodePudding user response:

Use groupby and cumcount if you want to keep at most 3 rows per group.

out = df[df.groupby('ID').cumcount() < 2]  # < 2 because cumcount starts at 0
print(out)

# Output
  ID  col1  col2
0  A    11     0
1  A    14     0
2  B    15     0
3  B    95     1
5  c     0     1
6  c     9     1

CodePudding user response:

Change the condition from > to >=

out = df.groupby('ID').apply(lambda x: x.iloc[:-1] if len(x)>=3 else x).reset_index(drop=True)
Out[142]: 
  ID  col1  col2
0  A    11     0
1  A    14     0
2  B    15     0
3  B    95     1
4  c     0     1
5  c     9     1
  • Related