Assuming we have a dataframe df
:
date y_true y_pred1 y_pred2
0 2017-1-31 6.42 -2.35 15.57
1 2017-2-28 -2.35 15.57 6.64
2 2017-3-31 15.57 6.64 7.61
3 2017-4-30 6.64 7.61 10.28
4 2017-5-31 7.61 7.61 6.34
5 2017-6-30 10.28 6.34 4.88
6 2017-7-31 6.34 4.88 7.91
7 2017-8-31 6.34 7.91 6.26
8 2017-9-30 7.91 6.26 11.51
9 2017-10-31 6.26 11.51 10.73
10 2017-11-30 11.51 10.73 10.65
11 2017-12-31 10.73 10.65 32.05
I want to calculate the ratio of the upward, downward, and equal consistency of two consecutive months of data in two columns, and use it as an evaluation metric of the time series forecast results. The direction of the current month to previous month ratio: up means the current month value minus the previous month value is positive, similarly, down and equal
means negative and 0, respectively.
I calculated the results for the sample data using the following function and code, note that we do not include the yellow rows in the calculation of the final ratio, because the y_true_dir
for these rows is either null
or 0
:
def cal_arrays_direction(value):
if value > 0:
return 1
elif value < 0:
return -1
elif value == 0:
return 0
else:
return np.NaN
df['y_true_diff'] = df['y_true'].diff(1).map(cal_arrays_direction)
df['y_pred1_diff'] = df['y_pred1'].diff(1).map(cal_arrays_direction)
df['y_pred2_diff'] = df['y_pred2'].diff(1).map(cal_arrays_direction)
df['y_true_y_pred1'] = np.where((df['y_true_diff'] == df['y_pred1_diff']), 1, 0)
df['y_true_y_pred2'] = np.where((df['y_true_diff'] == df['y_pred2_diff']), 1, 0)
dir_acc_y_true_pred1 = df['y_true_y_pred1'].value_counts()[1] / (df['y_true_diff'].value_counts()[-1]
df['y_true_diff'].value_counts()[1])
print(dir_acc_y_true_pred1)
dir_acc_y_true_pred2 = df['y_true_y_pred2'].value_counts()[1] / (df['y_true_diff'].value_counts()[-1]
df['y_true_diff'].value_counts()[1])
print(dir_acc_y_true_pred2)
Out:
0.2
0.4
But I wonder how could I convert it into a function (similar to MSE
, RMSE
, etc. in sklearn
) to make it's easier to use, thanks!
def direction_consistency_acc(y_true, y_pred):
...
return dir_acc_ratio
CodePudding user response:
You can create custom function, instead custom function use numpy.sign
and instead .value_counts()[1]
compare by 1
and count True
s by sum
:
#y_true - Series, y_pred - DataFrame
def direction_consistency_acc(y_true, y_pred):
df['y_true_diff'] = np.sign(y_true.diff(1))
s = df['y_true_diff'].value_counts()
out = []
for col in y_pred.columns:
df[f'y_{col}_diff'] = np.sign(df[col].diff(1))
df[f'y_true_{col}'] = np.where((df['y_true_diff'] == df[f'y_{col}_diff']), 1, 0)
dir_acc_y_true_pred = df[f'y_true_{col}'].eq(1).sum() / (s[-1] s[1])
out.append(dir_acc_y_true_pred)
return out
out = direction_consistency_acc(df['y_true'], df[['y_pred1','y_pred2']])
print(out)
[0.2, 0.4]
Alternative without new columns:
#y_true - Series, y_pred - DataFrame
def direction_consistency_acc(y_true, y_pred):
y_true_diff = np.sign(y_true.diff(1))
s = y_true_diff.value_counts()
out = []
for col in y_pred.columns:
y_true = y_true_diff == np.sign(df[col].diff(1))
dir_acc_y_true_pred = y_true.eq(1).sum() / (s[-1] s[1])
out.append(dir_acc_y_true_pred)
return out
out = direction_consistency_acc(df['y_true'], df[['y_pred1','y_pred2']])
print(out)
[0.2, 0.4]
CodePudding user response:
My updated answer to this question:
def cal_arrays_direction(value):
if value > 0:
return 1
elif value < 0:
return -1
elif value == 0:
return 0
else:
return np.NaN
def direction_consistency_acc(y_true, y_pred):
df['y_true_diff'] = y_true.diff(1).map(cal_arrays_direction)
df['y_pred_diff'] = y_pred.diff(1).map(cal_arrays_direction)
df['y_true_pred_consis'] = np.where((df['y_true_diff'] == df['y_pred_diff']), 1, 0)
dir_acc_ratio = df['y_true_pred_consis'].value_counts()[1] / (df['y_true_diff'].value_counts()[-1]
df['y_true_diff'].value_counts()[1])
return dir_acc_ratio
direction_consistency_acc(df['y_true'], df['y_pred1'])
Out:
0.2