lets day we have a dataframe,df with two columns as given below. Variable A has two levels 1 and 2. Variable B has three levels YES, NO, and OTHER. We want to derive another dataframe, df2 with variable C which takes a value of "1" if there exists atleast one YES for any level in variable A , other wise "0".
df
A B
1 YES
1 YES
1 OTHER
1 NO
1 YES
1 NO
2 YES
2 YES
2 YES
2 NO
2 YES
3 OTHER
3 NO
3 NO
3 NO
df2
A C
1 1
2 1
3 0
CodePudding user response:
Use groupby
:
>>> df['B'].eq('YES').groupby(df['A']).any().astype(int).reset_index(name='C')
A C
0 1 1
1 2 1
2 3 0
CodePudding user response:
One option is to convert column B
into numbers, using a defaultdict, and after, group by on A
to get the max:
from collections import defaultdict
d = defaultdict(int)
d['YES'] = 1
df.assign(B = df.B.map(d)).groupby('A', as_index = False).agg(C=('B', 'max'))
A C
0 1 1
1 2 1
2 3 0