Given a folder which contains 3 predictions result, one example as follows:
date real pred indicator
0 2021/1/31 1.0 0.586583 train
1 2021/2/28 0.6 0.442644 train
2 2021/3/31 -0.5 -0.011877 train
3 2021/4/30 -0.3 -0.011877 train
4 2021/5/31 -0.2 -0.011877 train
5 2021/6/30 -0.4 -0.011877 train
6 2021/7/31 0.3 0.152951 train
7 2021/8/31 0.1 0.393088 train
8 2021/9/30 0.0 0.163108 train
9 2021/10/31 0.7 0.537028 valid
10 2021/11/30 0.4 0.586583 valid
I hope to loop and read all excel files, calculate MAE
respectively for subset train
and test
of each prediction file, and finally concatenate the calculated MAE
values and excel file names.
Code I have done:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_absolute_error
import os
from pathlib import Path
import re
import glob
file_paths = glob.glob('./preds/*.xlsx')
dfs = pd.DataFrame()
for file_path in file_paths:
data = pd.read_excel(file_path)
# print(data)
model_name = Path(file_path).stem
print(model_name)
train = data.loc[data['indicator'] == 'train']
valid = data.loc[data['indicator'] == 'valid']
print(mean_absolute_error(train['real'], train['pred']))
print(mean_absolute_error(valid['real'], valid['pred']))
Out:
pred3
0.06462038177996876
0.14329840342203767
pred2
0.29968300930091313
0.20961438417434672
pred1
0.2807230830192565
0.1747779786586765
To create dataframe for save output:
col_names = ['model', 'train_mae', 'valid_mae']
dfs = pd.DataFrame(columns=col_names)
My question is how could I append the iteratively calculated MAE
values to the final result? Thanks.
The expected result:
model mae_train mae_valid
0 pred1 0.280723 0.174778
1 pred2 0.299683 0.209614
2 pred3 0.064620 0.143298
EDIT:
model_name = []
train_mae = []
test_mae = []
for file_path in file_paths:
data = pd.read_excel(file_path)
# print(data)
model_name = Path(file_path).stem
print(model_name)
train = data.loc[data['indicator'] == 'train']
valid = data.loc[data['indicator'] == 'valid']
print(mean_absolute_error(train['real'], train['pred']))
print(mean_absolute_error(valid['real'], valid['pred']))
train_mae = mean_absolute_error(train['real'], train['pred'])
valid_mae = mean_absolute_error(valid['real'], valid['pred'])
model_name.append(model_name)
train_mae.append(train_mae)
valid_mae.append(valid_mae)
df = pd.DataFrame(list(zip(model_name , train_mae, valid_mae)),
columns =['model', 'train_mae', 'valid_mae'])
Errors:
Traceback (most recent call last):
File "<input>", line 23, in <module>
AttributeError: 'str' object has no attribute 'append'
CodePudding user response:
Use a list of lists to store your information, then pass it to pd.DataFrame
:
mae_data = []
for file_path in file_paths:
data = pd.read_excel(file_path)
# print(data)
model_name = Path(file_path).stem
print(model_name)
train = data.loc[data['indicator'] == 'train']
valid = data.loc[data['indicator'] == 'valid']
print(mean_absolute_error(train['real'], train['pred']))
print(mean_absolute_error(valid['real'], valid['pred']))
## append [model_name, mae_train, mae_valid] to mae_list
mae_data.append([model_name, mean_absolute_error(train['real'], train['pred']),mean_absolute_error(valid['real'], valid['pred'])])
col_names = ['model', 'train_mae', 'valid_mae']
dfs = pd.DataFrame(data=mae_data, columns=col_names)