Looping over MAPE function in Pandas throwing error-CodePudding

I have the following dataframe:

df1 = pd.DataFrame({'id' : [1, 2, 1,2], 'plat' : ['and','and','ios','ios'], 'd30_real' : [1.2,1.4,1.5,1.9], 'd3_d30':[1.1,1.5,1.5,1.8], 'd7_d30':[1.4,1.5,1.6,1.9], 'd14_d30':[1.2,1.3,1.5,2.0]})

I want to calculate the MAPE with Sklearn function comparing the column on the lista_target as real value and the columns of the lista_com as predictions. This last list has to be A LIST OF LISTS. This cannot be changed.

lista_target = ['d30_real']
lista_com = [['d3_d30','d7_d30','d14_d30']]
mape_hm = pd.DataFrame()

This is the loop to generate the MAPE results:

for i in range(len(lista_target)):
  for e in range(len(lista_com[i])):
    mape_hm[i][e] = df1.groupby(by = ['id','plat']).apply(lambda x: mean_absolute_percentage_error(x[lista_target[i]], x[lista_com[i][e]]))

But it is giving me this error:

KeyError: 0

I understand that this is because it is not finding the position '0' on the lista_target. I would like what I am doing wrong, as I need to read the string inside the list, no the position.

This would be the output (fake numbers):

result = pd.DataFrame({'id' : [1, 2, 1,2], 'plat' : ['and','and','ios','ios'], 'd30_real' : [1.2,1.4,1.5,1.9], 'MAPE_d3_d30':[0.02,0.03,0.4,0.0], 'MAPE_d7_d30':[0.03,0.04,0.06,0], 'MAPE_d14_d30':[0.0,0.02,0,0.09]})

Thanks!

CodePudding user response：

I see in your question, that you use groupby on ['id','plat'] columns then I write the answer with groupby and use apply on it and create the dataframe for sklearn.metrics.mean_absolute_percentage_error for columns that you want.

from sklearn.metrics import mean_absolute_percentage_error

cols = [['d3_d30'], ['d7_d30', 'd14_d30']]
lst = []
def f_mape(x):
    dct = {}
    for col in cols:
        for c in col:
            dct[f'real_{c}'] = mean_absolute_percentage_error(x['d30_real'], x[c])
    lst.append(dct)

df1.groupby(['id', 'plat']).apply(lambda x: f_mape(x))
print(pd.DataFrame(lst))

Output:

   real_d3_d30  real_d7_d30  real_d14_d30
0     0.083333     0.166667      0.000000
1     0.000000     0.066667      0.000000
2     0.071429     0.071429      0.071429
3     0.052632     0.000000      0.052632

CodePudding user response：

Assuming you want to compute the MAPE per column, per group:

from sklearn.metrics import mean_absolute_percentage_error as mape
(df1
 .groupby(['id','plat'])[lista_com[0]]
 .transform(lambda g: mape(df1.loc[g.index, lista_target[0]], g))
 .add_prefix('MAPE_')
)

output:

   MAPE_d3_d30  MAPE_d7_d30  MAPE_d14_d30
0     0.083333     0.166667      0.000000
1     0.071429     0.071429      0.071429
2     0.000000     0.066667      0.000000
3     0.052632     0.000000      0.052632

full output:

out = df1.drop(columns=lista_com[0]).join(df1
 .groupby(['id','plat'])[lista_com[0]]
 .transform(lambda g: mape(df1.loc[g.index, lista_target[0]], g))
 .add_prefix('MAPE_')
)

output:

   id plat  d30_real  MAPE_d3_d30  MAPE_d7_d30  MAPE_d14_d30
0   1  and       1.2     0.083333     0.166667      0.000000
1   2  and       1.4     0.071429     0.071429      0.071429
2   1  ios       1.5     0.000000     0.066667      0.000000
3   2  ios       1.9     0.052632     0.000000      0.052632