How do i set filtering conditions per group?-CodePudding

For example, with the example dataset below, how do I set a filtering condition using jupyter notebook where I can find out who has 2 or more results of "Number of Candies" more than or equal to 3.1?

ID	Name	Number of Candies
1	Jerry	3.1
1	Jerry	3
1	Jerry	2
2	Tom	6
2	Tom	5.8
3	Wendy	3.1
3	Wendy	3.2

Thus, the output after filtering should be

ID	Name	Number of Candies
2	Tom	6
2	Tom	5.8
3	Wendy	3.1
3	Wendy	3.2

CodePudding user response：

If you can use pandas, we could do this in two steps:

Of course, first make sure you're using a numeric dtype in your "Number of Candies" column:
```
df["Number of Candies"] = df["Number of Candies"].astype(float)
```
Filter rows where "Number of Candies" is more than or equal to 3.1
```
df.loc[df["Number of Candies"] >= 3.1]
```
Then group by "Name" and keep groups larger than or equal to 2
```
.groupby("Name").filter(lambda group: len(group) >= 2)
```

You can put both conditions together and get the result you want in a single line:

>>> df.loc[df["Number of Candies"] >= 3.1].groupby("Name").filter(lambda group: len(group) >= 2)
   ID    Name  Number of Candies
3   2    Tom                 6.0
4   2    Tom                 5.8
5   3  Wendy                 3.1
6   3  Wendy                 3.2

CodePudding user response：

You can use a boolean mask computed per group:

#             is more than 3.1?       /    get count of True per group    / ≥ 2? 
mask = df['Number of Candies'].ge(3.1).groupby(df['ID']).transform('sum').ge(2)
df[mask]

output:

   ID   Name  Number of Candies
3   2    Tom                6.0
4   2    Tom                5.8
5   3  Wendy                3.1
6   3  Wendy                3.2