I have a pandas dataframe that looks like this:
import pandas as pd
pd.DataFrame({'id': [1,1,1,2,2,3,3,3],
'col': ['a','a','a','a','b','c','b','a']})
id col
0 1 a
1 1 a
2 1 a
3 2 a
4 2 b
5 3 c
6 3 b
7 3 a
I would like to create an indicator column which will tell me, if an id
has both "a" and "b" in the col
The output should look like this:
pd.DataFrame({'id': [1,1,1,2,2,3,3,3],
'col': ['a','a','a','a','b','c','b','a'],
'indicator': [0,0,0,1,1,1,1,1]})
id col indicator
0 1 a 0
1 1 a 0
2 1 a 0
3 2 a 1
4 2 b 1
5 3 c 1
6 3 b 1
7 3 a 1
How can I do that in pandas ?
CodePudding user response:
You can compare sets per groups in lambda function in GroupBy.transform
:
df['indicator'] = (df.groupby('id')['col']
.transform(lambda x: set(x) >= {'a','b'})
.astype(int))
print (df)
id col indicator
0 1 a 0
1 1 a 0
2 1 a 0
3 2 a 1
4 2 b 1
5 3 c 1
6 3 b 1
7 3 a 1
Or get same values in both filtered id
and mapping:
both = np.intersect1d(df.loc[df['col'].eq('a'), 'id'],
df.loc[df['col'].eq('b'), 'id'])
df['indicator'] = df['id'].isin(both).astype(int)
Or mapping compared sets:
df['indicator'] = (df['id'].map(df.groupby('id')['col']
.agg(set).ge({'a','b'}).astype(int)))
CodePudding user response:
Use set operations: groupby.transform
with set.issubset
df['indicator'] = (df.groupby("id")['col']
.transform({'a', 'b'}.issubset)
.astype(int)
)
Or:
df['indicator'] = (df.groupby("id")['col']
.transform(lambda g: int({'a', 'b'}.issubset(g)))
)
Alternative approach, use a crosstab
with reindex
and aggregation with all
:
s = pd.crosstab(df['id'], df['col']).reindex(columns=['a', 'b']).all(axis=1)
df['indicator'] = df['id'].isin(s.index[s]).astype(int)
Output:
id col indicator
0 1 a 0
1 1 a 0
2 1 a 0
3 2 a 1
4 2 b 1
5 3 c 1
6 3 b 1
7 3 a 1