Home > Software design >  Groupby To Keep Combined Rows As 'Both'
Groupby To Keep Combined Rows As 'Both'

Time:09-09

I'm working on the following data:

df = pd.DataFrame({'student': 'AA AA AA BB BB BB CC CC CC CC'.split(),
                   'grade': ['B', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'B']})

print(df)

  student   grade
0      AA     B
1      AA     B
2      AA     B
3      BB     A
4      BB     A
5      BB     B
6      CC     B
7      CC     B
8      CC     B
9      CC     B

I want to groupby() with student and combine. If the student has both the grades A and B, we'd mark as both, else we'd keep as the way the grade is.

I am trying the following code, but I'm not sure if it's optimal.

def get_both(data):
    Ptype = pd.crosstab(data['student'], data['grade'])
    data['grade'] = data['student'].map(Ptype.ne(0).dot(Ptype.columns))
    data.drop_duplicates(inplace=True)
    return data

get_both(df)

    student grade
0   AA       B
3   BB       AB
6   CC       B

Desired Output:

  student grade
0      AA     B
1      BB     both
2      CC     B

Is there any better way of getting the desired output? Any suggestions would be appreciated. Thanks!

CodePudding user response:

You can try:

agg = df.groupby('student')['grade'].agg(['nunique', 'first'])
out = agg['first'].where(agg['nunique'].eq(1), 'both').reset_index(name='grade')

Output:

  student grade
0      AA     B
1      BB  both
2      CC     B

CodePudding user response:

Here is slight different approach:

groupby column student then for grade column apply a lambda function to return both or the first grade depending on the condition

(df
 .groupby(['student'], as_index=False)['grade']
 .apply(lambda x: 'both' if x.eq('A').any() and x.eq('B').any() else x.iloc[0])
 )

OUTPUT

  student grade
0      AA     B
1      BB  both
2      CC     B
  • Related