I'm working on the following data:
df = pd.DataFrame({'student': 'AA AA AA BB BB BB CC CC CC CC'.split(),
'grade': ['B', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'B']})
print(df)
student grade
0 AA B
1 AA B
2 AA B
3 BB A
4 BB A
5 BB B
6 CC B
7 CC B
8 CC B
9 CC B
I want to groupby()
with student
and combine. If the student has both the grades A
and B
, we'd mark as both
, else we'd keep as the way the grade
is.
I am trying the following code, but I'm not sure if it's optimal.
def get_both(data):
Ptype = pd.crosstab(data['student'], data['grade'])
data['grade'] = data['student'].map(Ptype.ne(0).dot(Ptype.columns))
data.drop_duplicates(inplace=True)
return data
get_both(df)
student grade
0 AA B
3 BB AB
6 CC B
Desired Output:
student grade
0 AA B
1 BB both
2 CC B
Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!
CodePudding user response:
You can try:
agg = df.groupby('student')['grade'].agg(['nunique', 'first'])
out = agg['first'].where(agg['nunique'].eq(1), 'both').reset_index(name='grade')
Output:
student grade
0 AA B
1 BB both
2 CC B
CodePudding user response:
Here is slight different approach:
groupby
column student
then for grade
column apply a lambda function to return both
or the first grade depending on the condition
(df
.groupby(['student'], as_index=False)['grade']
.apply(lambda x: 'both' if x.eq('A').any() and x.eq('B').any() else x.iloc[0])
)
OUTPUT
student grade
0 AA B
1 BB both
2 CC B