Home > Back-end >  How to apply pandas groupby to a dataframe to use both rows and columns when calculating a mean
How to apply pandas groupby to a dataframe to use both rows and columns when calculating a mean

Time:07-17

I have a dataframe df in the format:

     Grade   Height    Speed   Value
 0     A       13      0.1     500
 1     B       25      0.3     100
 2     C       54      0.6     200

And I am looking to group it such that I intersect the Rating as the index, the Height (split into buckets) as the columns, and within the individual cells have the average value for the combination of Grade and Height.

So, the output dataframe would look something like this:

                           Height  
Grade  0-10          10-25         25-50      50-100
 A   avg(speed*value)   x            x          x
 B      x               x            x          x
 C      x               x            x          x

where the x's are the calculated mean speed*value.

I have attempted unsuccessfully with something like:

output = pd.DataFrame(data=df, index = df[df['Grade']], columns = df[df['Height']].groupby(pd.qcut(df['Height'], 3, duplicates='drop'))).groupby(df['Value')).mean()

but I can't quite figure out a method that might work not throwing errors or an empty df.

Would you have any ideas I can try out?

CodePudding user response:

  • Use pd.cut to break your Height Column into bins.
  • Create a new column of Speed * Value
  • Pivot your table, mean is the default pivot function.
    • dropna=False is used so that even null bins are shown.
df.Height = pd.cut(df.Height, bins=[0, 10, 25, 50, 100])
df['speed_value'] = df.Speed.mul(df.Value)
out = df.pivot_table(index='Grade', columns='Height', values='speed_value', dropna=False)
print(out)

Output:

Height  (0, 10]  (10, 25]  (25, 50]  (50, 100]
Grade
A           NaN      50.0       NaN        NaN
B           NaN      30.0       NaN        NaN
C           NaN       NaN       NaN      120.0

CodePudding user response:

Use:

df['agg'] = pd.cut(df['Height'].astype(int), [0,10,25,50,100])
s = df.pivot_table(index='Grade', columns='agg', values=['Speed', 'Value'], dropna=False)
s.apply(lambda x: [x[i]*x[i 4] for i in range(4)], axis = 1).apply(pd.Series).rename(columns = {i: s.columns.get_level_values(1).categories[i] for i in range(4)})

Output:

    (0, 10] (10, 25]    (25, 50]    (50, 100]
Grade               
A   NaN 50.0    NaN NaN
B   NaN 30.0    NaN NaN
C   NaN NaN NaN 120.0
  • Related