Computing averages of groups of rows-CodePudding

If the dataframe looks like this:

Class1     Class2    Exam     Score1     Score2
a1         a2        a        0.90       0.70
a1         a2        b        0.80       0.70  
a1         a2        c        0.10       0.70
a1         a3        c        0.10       0.20
a2         a3        c        0.70       0.20
a2         a3        d        0.30       0.10
a3         a4        e        1.00       0.80
a3         a4        f        0.00       0.70
a3         a5        e        1.00       0.90
a3         a5        f        0.00       0.90
a4         a5        e        0.80       0.90
a4         a5        f        0.70       0.90

I want to compute the mean for each class, but per pair (a1-a2, a1-a3 etc., just like in the dataframe above). I did the most naive thing and split the dataframe into multiple ones and then computed the mean over the whole Scores columns. But this is impossible when the dataframe has hundreds of pairs. How could I group by pair?

The desired output:

Class1    Class2    Average1    Average2
a1        a2        0.60        0.70
a1        a3        0.10        0.20
a2        a3        0.50        0.15
a3        a4        0.50        0.75
a3        a5        0.50        0.90
a4        a5        0.75        0.90

CodePudding user response：

Simply groupby both columns. Named Aggregations here can help to both aggregate and rename in the same step:

new_df = df.groupby(
    ['Class1', 'Class2'], as_index=False
).agg(Average1=('Score1', 'mean'), Average2=('Score2', 'mean'))

new_df:

  Class1 Class2  Average1  Average2
0     a1     a2      0.60      0.70
1     a1     a3      0.10      0.20
2     a2     a3      0.50      0.15
3     a3     a4      0.50      0.75
4     a3     a5      0.50      0.90
5     a4     a5      0.75      0.90

Setup and imports:

import pandas as pd

df = pd.DataFrame({
    'Class1': ['a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a3', 'a3', 'a3', 'a3', 'a4',
               'a4'],
    'Class2': ['a2', 'a2', 'a2', 'a3', 'a3', 'a3', 'a4', 'a4', 'a5', 'a5', 'a5',
               'a5'],
    'Exam': ['a', 'b', 'c', 'c', 'c', 'd', 'e', 'f', 'e', 'f', 'e', 'f'],
    'Score1': [0.9, 0.8, 0.1, 0.1, 0.7, 0.3, 1.0, 0.0, 1.0, 0.0, 0.8, 0.7],
    'Score2': [0.7, 0.7, 0.7, 0.2, 0.2, 0.1, 0.8, 0.7, 0.9, 0.9, 0.9, 0.9]
})