I have the following dataframe:
df1 = pd.DataFrame({'id' : [1, 2, 1,2], 'plat' : ['and','and','ios','ios'], 'd30_real' : [1.2,1.4,1.5,1.9], 'd3_d30':[1.1,1.5,1.5,1.8], 'd7_d30':[1.4,1.5,1.6,1.9], 'd14_d30':[1.2,1.3,1.5,2.0]})
I want to calculate the MAPE with Sklearn function comparing the column on the lista_target as real value and the columns of the lista_com as predictions. This last list has to be A LIST OF LISTS. This cannot be changed.
lista_target = ['d30_real']
lista_com = [['d3_d30','d7_d30','d14_d30']]
mape_hm = pd.DataFrame()
This is the loop to generate the MAPE results:
for i in range(len(lista_target)):
for e in range(len(lista_com[i])):
mape_hm[i][e] = df1.groupby(by = ['id','plat']).apply(lambda x: mean_absolute_percentage_error(x[lista_target[i]], x[lista_com[i][e]]))
But it is giving me this error:
KeyError: 0
I understand that this is because it is not finding the position '0' on the lista_target. I would like what I am doing wrong, as I need to read the string inside the list, no the position.
This would be the output (fake numbers):
result = pd.DataFrame({'id' : [1, 2, 1,2], 'plat' : ['and','and','ios','ios'], 'd30_real' : [1.2,1.4,1.5,1.9], 'MAPE_d3_d30':[0.02,0.03,0.4,0.0], 'MAPE_d7_d30':[0.03,0.04,0.06,0], 'MAPE_d14_d30':[0.0,0.02,0,0.09]})
Thanks!
CodePudding user response:
I see in your question, that you use groupby on ['id','plat']
columns then I write the answer with groupby
and use apply on it and create the dataframe for sklearn.metrics.mean_absolute_percentage_error
for columns that you want.
from sklearn.metrics import mean_absolute_percentage_error
cols = [['d3_d30'], ['d7_d30', 'd14_d30']]
lst = []
def f_mape(x):
dct = {}
for col in cols:
for c in col:
dct[f'real_{c}'] = mean_absolute_percentage_error(x['d30_real'], x[c])
lst.append(dct)
df1.groupby(['id', 'plat']).apply(lambda x: f_mape(x))
print(pd.DataFrame(lst))
Output:
real_d3_d30 real_d7_d30 real_d14_d30
0 0.083333 0.166667 0.000000
1 0.000000 0.066667 0.000000
2 0.071429 0.071429 0.071429
3 0.052632 0.000000 0.052632
CodePudding user response:
Assuming you want to compute the MAPE per column, per group:
from sklearn.metrics import mean_absolute_percentage_error as mape
(df1
.groupby(['id','plat'])[lista_com[0]]
.transform(lambda g: mape(df1.loc[g.index, lista_target[0]], g))
.add_prefix('MAPE_')
)
output:
MAPE_d3_d30 MAPE_d7_d30 MAPE_d14_d30
0 0.083333 0.166667 0.000000
1 0.071429 0.071429 0.071429
2 0.000000 0.066667 0.000000
3 0.052632 0.000000 0.052632
full output:
out = df1.drop(columns=lista_com[0]).join(df1
.groupby(['id','plat'])[lista_com[0]]
.transform(lambda g: mape(df1.loc[g.index, lista_target[0]], g))
.add_prefix('MAPE_')
)
output:
id plat d30_real MAPE_d3_d30 MAPE_d7_d30 MAPE_d14_d30
0 1 and 1.2 0.083333 0.166667 0.000000
1 2 and 1.4 0.071429 0.071429 0.071429
2 1 ios 1.5 0.000000 0.066667 0.000000
3 2 ios 1.9 0.052632 0.000000 0.052632