In my dataframe below I wish to remove rows based on two conditions:
- has an id that appears two times
- belong to group B
In the dataframe (data) below it would be row with index 1 and 6 that should be removed. How could this be solved in an elegant fashion?
data=pd.DataFrame({'id':[1,1,2,3,4,5,5],
'group':['A', 'B', 'A', 'B', 'A', 'A', 'B']})
# Thought about something like this, but that dosent work.
mask1 = [data.groupby(['id'])['group'].transform(len) > 1]
mask2 = [data.group=='B']
data.isin([mask1 & mask2])
CodePudding user response:
Use boolean indexing:
# does the id appear more than once?
m1 = data['id'].duplicated(keep=False)
# is the row a group B?
m2 = data['group'].eq('B')
# keep the row if not both conditions are met
out = data[~(m1&m2)]
De Morgan's equivalent:
m1 = ~data['id'].duplicated(keep=False)
m2 = data['group'].ne('B')
out = data[m1|m2]
output:
id group
0 1 A
2 2 A
3 3 B
4 4 A
5 5 A