Home > Net >  How do i set filtering conditions per group?
How do i set filtering conditions per group?

Time:03-02

For example, with the example dataset below, how do I set a filtering condition using jupyter notebook where I can find out who has 2 or more results of "Number of Candies" more than or equal to 3.1?

ID Name Number of Candies
1 Jerry 3.1
1 Jerry 3
1 Jerry 2
2 Tom 6
2 Tom 5.8
3 Wendy 3.1
3 Wendy 3.2

Thus, the output after filtering should be

ID Name Number of Candies
2 Tom 6
2 Tom 5.8
3 Wendy 3.1
3 Wendy 3.2

CodePudding user response:

If you can use , we could do this in two steps:

  1. Of course, first make sure you're using a numeric dtype in your "Number of Candies" column:

    df["Number of Candies"] = df["Number of Candies"].astype(float)
    
  2. Filter rows where "Number of Candies" is more than or equal to 3.1

    df.loc[df["Number of Candies"] >= 3.1]
    
  3. Then group by "Name" and keep groups larger than or equal to 2

    .groupby("Name").filter(lambda group: len(group) >= 2)
    

You can put both conditions together and get the result you want in a single line:

>>> df.loc[df["Number of Candies"] >= 3.1].groupby("Name").filter(lambda group: len(group) >= 2)
   ID    Name  Number of Candies
3   2    Tom                 6.0
4   2    Tom                 5.8
5   3  Wendy                 3.1
6   3  Wendy                 3.2

CodePudding user response:

You can use a boolean mask computed per group:

#             is more than 3.1?       /    get count of True per group    / ≥ 2? 
mask = df['Number of Candies'].ge(3.1).groupby(df['ID']).transform('sum').ge(2)
df[mask]

output:

   ID   Name  Number of Candies
3   2    Tom                6.0
4   2    Tom                5.8
5   3  Wendy                3.1
6   3  Wendy                3.2
  • Related