Home > Blockchain >  Data Frame % column by groupping
Data Frame % column by groupping

Time:04-26

I am working on a forecast accuracy report which measure the deviation between actual & pervious projection. The measurement would be = 1- ('Actual' - 'M-1') / 'Actual' . There measure need to be groupped based different gratuity, say 'Product Category' / 'Line' / 'Product'. However, the df.groupby('Product Category').sum() function couldnt support the percentage calculation. Does anyone have idea how it should be fixed? Thanks!

data = {
    "Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
    "Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
    "Product": ['A', 'B', 'C','D','E','F'],
    "Actual": [100,50,40,20,70,50],
    "M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)

df['M1 Gap'] = df['Actual'] - df['M-1']
df['Error_Per'] =  1- df['M1 Gap'] / df['Actual']

Expected output would be enter image description here

CodePudding user response:

You should group BEFORE calculating percentage:

data = {
    "Product Category": ['Drink', 'Drink','Drink','Food','Food','Food'],
    "Line": ['Water', 'Water','Wine','Fruit','Fruit','Fruit'],
    "Product": ['A', 'B', 'C','D','E','F'],
    "Actual": [100,50,40,20,70,50],
    "M-1": [120,40,10,20,80,50],
}
df = pd.DataFrame(data)

df['M1 Gap'] = df['Actual'] - df['M-1']

df_line = df.groupby('Line').sum()
df_line['Error_Per'] =  df_line['M1 Gap'] / df_line['Actual']
print(df_line)

df_prod = df.groupby('Product Category').sum()
df_prod['Error_Per'] =  df_prod['M1 Gap'] / df_prod['Actual']
print(df_prod)

Output:

       Actual  M-1  M1 Gap  Error_Per
Line
Fruit     140  150     -10  -0.071429
Water     150  160     -10  -0.066667
Wine       40   10      30   0.750000

                  Actual  M-1  M1 Gap  Error_Per
Product Category
Drink                190  170      20   0.105263
Food                 140  150     -10  -0.071429

Note: your expected Outcome from the screenshot doesn't match the dictionary of your code (which I used)

CodePudding user response:

You can also create a custom function and apply it on every row of a pandas data frame as follows. Just note that I set the axis argument to 1 so that the custom function is applied on each row or across columns:

import pandas as pd

def func(row):
    row['M1 Gap'] = row['Actual'] - row['M-1']
    row['Error_Per'] = 1 - (row['M1 Gap'] / row['Actual'])
    return row


df.groupby('Product Category').sum().apply(func, axis = 1)


                  Actual    M-1  M1 Gap  Error_Per
Product Category                                  
Drink              190.0  170.0    20.0   0.894737
Food               140.0  150.0   -10.0   1.071429
  • Related