I am working on a forecast accuracy report which measure the deviation between actual & pervious projection. The measurement would be = 1- ('Actual' - 'M-1') / 'Actual' .
There measure need to be groupped based different gratuity, say 'Product Category' / 'Line' / 'Product'. However, the df.groupby('Product Category').sum()
function couldnt support the percentage calculation. Does anyone have idea how it should be fixed? Thanks!
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df['Error_Per'] = 1- df['M1 Gap'] / df['Actual']
Expected output would be enter image description here
CodePudding user response:
You should group BEFORE calculating percentage:
data = {
"Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
"Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
"Product": ['A', 'B', 'C','D','E','F'],
"Actual": [100,50,40,20,70,50],
"M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)
df['M1 Gap'] = df['Actual'] - df['M-1']
df_line = df.groupby('Line').sum()
df_line['Error_Per'] = df_line['M1 Gap'] / df_line['Actual']
print(df_line)
df_prod = df.groupby('Product Category').sum()
df_prod['Error_Per'] = df_prod['M1 Gap'] / df_prod['Actual']
print(df_prod)
Output:
Actual M-1 M1 Gap Error_Per
Line
Fruit 140 150 -10 -0.071429
Water 150 160 -10 -0.066667
Wine 40 10 30 0.750000
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190 170 20 0.105263
Food 140 150 -10 -0.071429
Note: your expected Outcome from the screenshot doesn't match the dictionary of your code (which I used)
CodePudding user response:
You can also create a custom function and apply
it on every row of a pandas data frame as follows. Just note that I set the axis
argument to 1
so that the custom function is applied on each row or across columns:
import pandas as pd
def func(row):
row['M1 Gap'] = row['Actual'] - row['M-1']
row['Error_Per'] = 1 - (row['M1 Gap'] / row['Actual'])
return row
df.groupby('Product Category').sum().apply(func, axis = 1)
Actual M-1 M1 Gap Error_Per
Product Category
Drink 190.0 170.0 20.0 0.894737
Food 140.0 150.0 -10.0 1.071429