Home > Software design >  How to filter values in data frame by grouped values in column
How to filter values in data frame by grouped values in column

Time:09-21

I have a dataframe:

id    value
a1      0
a1      1
a1      2
a1      3
a2      0
a2      1
a3      0
a3      1
a3      2
a3      3

I want to filter id's and leave only those which have value higher than 3. So in this example id a2 must be removed since it only has values 0 and 1. So desired result is:

id    value
a1      0
a1      1
a1      2
a1      3
a3      0
a3      1
a3      2
a3      3
a3      4
a3      5

How to to that in pandas?

CodePudding user response:

Updated.

Group by IDs and find their max values. Find the IDs whose max value is at or above 3:

keep = df.groupby('id')['value'].max() >= 3

Select the rows with the IDs that match:

df[df['id'].isin(keep[keep].index)]  

CodePudding user response:

Use boolean mask to keep rows that match condition then replace bad id (a2) by the next id (a3). Finally, group again by id an apply a cumulative sum.

mask = df.groupby('id')['value'] \
         .transform(lambda x: sorted(x.tolist()) == [0, 1, 2, 3])

df1 = df[mask].reindex(df.index).bfill()
df1['value'] = df1.groupby('id').agg('cumcount')

Output:

>>> df1
   id  value
0  a1      0
1  a1      1
2  a1      2
3  a1      3
4  a3      0
5  a3      1
6  a3      2
7  a3      3
8  a3      4
9  a3      5
  • Related