I have a dataframe df in the format:
Grade Height Speed Value
0 A 13 0.1 500
1 B 25 0.3 100
2 C 54 0.6 200
And I am looking to group it such that I intersect the Rating as the index, the Height (split into buckets) as the columns, and within the individual cells have the average value for the combination of Grade and Height.
So, the output dataframe would look something like this:
Height
Grade 0-10 10-25 25-50 50-100
A avg(speed*value) x x x
B x x x x
C x x x x
where the x's are the calculated mean speed*value.
I have attempted unsuccessfully with something like:
output = pd.DataFrame(data=df, index = df[df['Grade']], columns = df[df['Height']].groupby(pd.qcut(df['Height'], 3, duplicates='drop'))).groupby(df['Value')).mean()
but I can't quite figure out a method that might work not throwing errors or an empty df.
Would you have any ideas I can try out?
CodePudding user response:
- Use
pd.cut
to break yourHeight
Column into bins. - Create a new column of
Speed * Value
- Pivot your table,
mean
is the default pivot function.dropna=False
is used so that even null bins are shown.
df.Height = pd.cut(df.Height, bins=[0, 10, 25, 50, 100])
df['speed_value'] = df.Speed.mul(df.Value)
out = df.pivot_table(index='Grade', columns='Height', values='speed_value', dropna=False)
print(out)
Output:
Height (0, 10] (10, 25] (25, 50] (50, 100]
Grade
A NaN 50.0 NaN NaN
B NaN 30.0 NaN NaN
C NaN NaN NaN 120.0
CodePudding user response:
Use:
df['agg'] = pd.cut(df['Height'].astype(int), [0,10,25,50,100])
s = df.pivot_table(index='Grade', columns='agg', values=['Speed', 'Value'], dropna=False)
s.apply(lambda x: [x[i]*x[i 4] for i in range(4)], axis = 1).apply(pd.Series).rename(columns = {i: s.columns.get_level_values(1).categories[i] for i in range(4)})
Output:
(0, 10] (10, 25] (25, 50] (50, 100]
Grade
A NaN 50.0 NaN NaN
B NaN 30.0 NaN NaN
C NaN NaN NaN 120.0