Home > Software design >  Remove rows based on two groupby conditions
Remove rows based on two groupby conditions

Time:11-10

In my dataframe below I wish to remove rows based on two conditions:

  • has an id that appears two times
  • belong to group B

In the dataframe (data) below it would be row with index 1 and 6 that should be removed. How could this be solved in an elegant fashion?

data=pd.DataFrame({'id':[1,1,2,3,4,5,5],
             'group':['A', 'B', 'A', 'B', 'A', 'A', 'B']})

# Thought about something like this, but that dosent work.
mask1 = [data.groupby(['id'])['group'].transform(len) > 1]
mask2 = [data.group=='B']

data.isin([mask1 & mask2])

CodePudding user response:

Use boolean indexing:

# does the id appear more than once?
m1 = data['id'].duplicated(keep=False)
# is the row a group B?
m2 = data['group'].eq('B')

# keep the row if not both conditions are met
out = data[~(m1&m2)]

De Morgan's equivalent:

m1 = ~data['id'].duplicated(keep=False)
m2 = data['group'].ne('B')
out = data[m1|m2]

output:

   id group
0   1     A
2   2     A
3   3     B
4   4     A
5   5     A
  • Related