I have a DataFrame that contains the following columns (along with others). I am trying to figure out how to remove all rows where: For each group number in ID_Dets if there exists more than 1 unique number in ID_Dets_2 then drop all rows.
I have bolded the rows that I would want removed. Thx!
Index | Other Columns | ID_Dets | ID_Dets_2 |
---|---|---|---|
11 | aga | 4 | 5 |
13 | af | 2 | 3 |
14 | da | 3 | 4 |
15 | aga | 3 | 4 |
16 | as | 0 | 0 |
17 | ga | 0 | 0 |
18 | asg | 0 | 1 |
19 | asa | 0 | 1 |
4 | ga | 1 | 2 |
5 | aah | 1 | 2 |
CodePudding user response:
You can count the number of unique values per group and set a threshold to have 1 unique value:
df[df.groupby('ID_Dets')['ID_Dets_2'].transform('nunique').eq(1)]
or:
df.groupby('ID_Dets').filter(lambda g: len(g['ID_Dets_2'].unique())<=1)
output:
Index Other Columns ID_Dets ID_Dets_2
0 11 aga 4 5
1 13 af 2 3
2 14 da 3 4
3 15 aga 3 4
8 4 ga 1 2
9 5 aah 1 2
CodePudding user response:
I think the answer by mozway is way slicker than mine, but I was working on it so might as well post it...
You can generate a list of ids to remove and then filter on that list.
ids_to_remove = []
for v in df['ID_Dets'].unique():
if df[df['ID_Dets'] == v]["ID_Dets_2"].nunique() > 1: ids_to_remove.append(a)
df = df[~df['ID_Dets'].isin(ids_to_remove)]