Let's say I have the following pandas dataset:
Column 1 | Column 2 |
---|---|
dog | RE |
dog | RE FX |
cat | RE BA |
mouse | AQ |
mouse | RE FX |
salmon | AQ |
Essentially what I would like to do is group the values in Column 1 and then either keep or delete them based on the values in Column 2. So for example, I want to delete all values in a group in Column 1 if ANY of the corresponding rows in Column 2 are "RE" or "RE BA". Based on the dataset above, the output would be the following:
Column 1 | Column 2 |
---|---|
mouse | AQ |
mouse | RE FX |
salmon | AQ |
I am struggling with this because although I understand how to drop rows based on whether they contain a specific value, I don't understand how to drop entire groups based on whether ANY of the rows in that group contain a specific value. Any help would be greatly appreciated!
CodePudding user response:
You can try groupby
then filter
out = df.groupby("Column 1").filter(lambda df: ~df['Column 2'].isin(["RE", "RE BA"]).any())
print(out)
Column 1 Column 2
3 mouse AQ
4 mouse RE FX
5 salmon AQ
CodePudding user response:
bools = df.groupby("Column 1").apply(lambda x: ~x["Column 2"].isin(["RE", "RE BA"]).any())
df[bools.loc[df["Column 1"]].reset_index(drop=True)]
# Column 1 Column 2
# 3 mouse AQ
# 4 mouse RE FX
# 5 salmon AQ