I have a dataframe which includes multiple items and each item has been purchased multiple times with different prices and in different quantities within a year. I need to find the weighted average of each item and map each item with its weighted average price into a new data frame.
Year Article Price Weight
18 2013 Cheese 18.0 26.0
19 2013 Apple 5.0 7.0
20 2013 Bun 2.0 2.0
21 2013 Coatl 4.0 5.0
22 2013 Cheese 20.0 21.0
23 2013 Peach 12.0 8.0
24 2013 Apple 4.6 3.0
import pandas as pd
df = newonly2013
def weightedAverage(df,Weight,Price):
return sum(df['Weight']*df['Price'])/sum(df['Weight'])
wa=weightedAverage(df,'Weight','Price')
ar = df.groupby('Article', as_index=False).apply(weightedAverage)
print(ar)
This is what i tried and failed, would appreciate it greatly if someone could help me out here. Thanks!
CodePudding user response:
With your custom function, only pass df
as parameter:
def weightedAverage(df):
return sum(df['Weight']*df['Price'])/sum(df['Weight'])
ar = df.groupby('Article', as_index=False).apply(weightedAverage)
Or, if you need to be able to pass the columns names:
def weightedAverage(df, Weight, Price):
return sum(df[Weight]*df[Price])/sum(df['Weight'])
ar = (df.groupby('Article', as_index=False)
.apply(weightedAverage, Weight='Weight', Price='Price')
)
Alternatively, use numpy.average
and its weights
parameter:
import numpy as np
out = (df.groupby('Article')
.apply(lambda d: np.average(d['Price'], weights=d['Weight']))
)
output:
Article
Apple 4.880000
Bun 2.000000
Cheese 18.893617
Coatl 4.000000
Peach 12.000000
dtype: float64
As DataFrame:
out = (df.groupby('Article')
.apply(lambda d: np.average(d['Price'], weights=d['Weight']))
.rename('average').reset_index()
)
Output:
Article average
0 Apple 4.880000
1 Bun 2.000000
2 Cheese 18.893617
3 Coatl 4.000000
4 Peach 12.000000
CodePudding user response:
You can use the following function to calculate a weighted average in Pandas: def w_avg (df, values, weights): d = df [values] w = df [weights] return (d * w).sum() / w.sum() The following examples show how to use this syntax in practice. Example 1: Weighted Average in Pandas