I would like to remove from each subgroup in a data frame, the rows which satisfy certain conditions. The subgroups are grouped based on the two columns Days
& ID
, here's my data frame:
df = pd.DataFrame({'Days':[5,5,5,5,6,6],
'ID':['A11','A11','A11','A11','B12','B12'],
'Country':['DE','DE','FR','US','US','US'],
'Car1':['BMW','Volvo','Audi','BMW','Mercedes','BMW'],
'Car2':['Volvo','Mercedes','BMW','Volvo','Volvo','Volvo'],
'Car3':['Mercedes',nan,'Volvo',nan,nan,nan]},
)
Days ID Country Car1 Car2 Car3
0 5 A11 DE BMW Volvo Mercedes
1 5 A11 DE Volvo Mercedes nan
2 5 A11 FR Audi BMW Volvo
3 5 A11 US BMW Volvo nan
4 6 B12 US Mercedes Volvo nan
5 6 B12 US BMW Volvo nan
I would like to remove the rows from each group satisfying the following three conditions:
1. Car3=nan
2. Car1=Car2 (from another row within the group)
3. Car2=Car3 (from another row within the group)
The expected data frame I would like to have:
Days ID Country Car1 Car2 Car3
0 5 A11 DE BMW Volvo Mercedes
1 5 A11 FR Audi BMW Volvo
2 6 B12 US Mercedes Volvo nan
3 6 B12 US BMW Volvo nan
CodePudding user response:
You can check membership per groups in lambda function with Series.isin
and filter invert mask by ~
in boolean indexing
:
m = (df.groupby(['Days','ID'], group_keys=False)
.apply(lambda x: x['Car1'].isin(x['Car2']) & x['Car2'].isin(x['Car3'])) &
df['Car3'].isna())
df = df[~m].reset_index(drop=True)
print (df)
Days ID Country Car1 Car2 Car3
0 5 A11 DE BMW Volvo Mercedes
1 5 A11 FR Audi BMW Volvo
2 6 B12 US Mercedes Volvo NaN
3 6 B12 US BMW Volvo NaN