I have a pandas df that looks like this:
import pandas as pd
d = {'value1': [1, 1, 1, 2, 3, 3, 4, 4, 4, 4], 'value2': ['A', 'B', 'C', 'C', 'A', 'B', 'B', 'A', 'A', 'B']}
df = pd.DataFrame(data=d)
df
Per group in column value1
I would like to check if that group contains at least one value 'C' in column value2
. If a group doesn't have a 'C' value, I would like to exclude that group
value1 value2
1 A
1 B
1 C
2 C
3 A
3 B
4 B
4 A
4 A
4 B
The resulting df should look like this:
value1 value2
1 A
1 B
1 C
2 C
What's the best way to achieve this?
CodePudding user response:
use groupby filter
df.groupby('value1').filter(lambda x: x['value2'].eq('C').sum() > 0)
CodePudding user response:
Here is another solution:
- First establish a list containing the
value1
values for whichvalue2
is equal toC
:mylist = df[df.value2=='C']['value1'].unique()
- Then filter the dataframe
df
, keeping only rows for whichvalue1
is in this list:df[df.value1.isin(mylist)]
.
Or as a one-liner:
df[df.value1.isin(df[df.value2=='C']['value1'].unique())]