I have a dataframe:
ID EM_results pa_id_1850
0 0 1
1 1 1
2 0 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 3
I want to remove all the rows where EM_results is occurring as 1 more than once per pa_id_1850 index. In this case:
ID EM_results pa_id_1850
0 0 1
1 1 1
2 0 1
3 0 2
4 0 2
I have tried something like:
grouped = df.groupby('pa_id_1850')
grouped.filter(lambda x: (x['EM_results'] == 1) <= 1)
But I can't quite get it to work
CodePudding user response:
Your attempt is close: filter
needs to return a scalar, not a Series. One way to achieve this is to filter on the sum of EM_results
values in each group, assuming these values are only ever 0 or 1:
grouped = df.groupby('pa_id_1850')
grouped.filter(lambda x: (x['EM_results'].sum()) <= 1)
# Output:
ID EM_results pa_id_1850
0 0 0 1
1 1 1 1
2 2 0 1
3 3 0 2
4 4 0 2
CodePudding user response:
Compute a boolean mask with the help of transform
, then apply it to the dataframe.
>>> mask = df.groupby('pa_id_1850')['EM_results'].transform('sum').le(1)
>>> df = df[mask]
>>> df
ID EM_results pa_id_1850
0 0 0 1
1 1 1 1
2 2 0 1
3 3 0 2
4 4 0 2