Calibrate a simple equation to minimise the loss , pandas dataframe-CodePudding

I have a sample of the dataframe as shown below,

Months      Price   Predicted Loss  Actual Loss     Diff
1/11/2019   4.00    3.43            4.00            0.57
1/11/2019   10.00   8.58            9.00            0.42
1/11/2019   20.00   17.16           16.00           -1.16
1/12/2019   44.00   37.74           35.00           -2.74
1/12/2019   66.00   56.61           56.00           -0.61
1/12/2019   7.00    6.00            5.00            -1.00
1/12/2019   9.00    7.72            8.00            0.28
...

Total              137.25           133.00

The way I am calculating the predcited loss as a function of price by using this equation i.e. Predicted Price = (0.92*(EXP(-0.07)*Price)) As you can see below I am overestimating the loss, so what I want to do is I want to calibrate the above equation so as to minimise the overall difference between the Predcited loss and Actual loss.

Currently, what I am doing is I am manually changing the exponential let's say (0.92*(EXP(-0.07)*Price)) to (0.92*(EXP(-0.072)*Price))and checking whether I get close to overall Actual Loss. I only want to change the exponential part and run my loop till I get an optimum solution which is my Predicted loss closer to my actual loss. Currently, I am working on a code using a list of coefficients and running through the loop. Please let me know what's the best way to achieve this.

CodePudding user response：

What you have is a linear regression model, so it has a closed form solution that you can read in an econometrics textbook. But for your specific question of minimizing the difference between actual and predicted losses, you can find it using a while-loop iteratively. Basically, what we want is to iteratively decrease the power of eps until the difference between the losses becomes very, very small. Below is one implementation:

actual_loss_sum = df['Actual Loss'].sum()
predicted_loss = df['Predicted Loss']
exponential = -0.07
while predicted_loss.sum() - actual_loss_sum > 0.001:
    exponential -= 0.0001
    predicted_loss = 0.92 * np.exp(exponential) * df['Price']

df['Predicted Loss'] = predicted_loss
df['Diff'] = df['Actual Loss'] - df['Predicted Loss']

Output:

      Months  Price  Predicted Loss  Actual Loss      Diff
0  1/11/2019    4.0        3.324811          4.0  0.675189
1  1/11/2019   10.0        8.312027          9.0  0.687973
2  1/11/2019   20.0       16.624054         16.0 -0.624054
3  1/12/2019   44.0       36.572918         35.0 -1.572918
4  1/12/2019   66.0       54.859377         56.0  1.140623
5  1/12/2019    7.0        5.818419          5.0 -0.818419
6  1/12/2019    9.0        7.480824          8.0  0.519176

In this case, df['Predicted Loss'].sum() = 132.99242959917476 and the difference between the sums of the losses is 0.0075704008252444055.

CodePudding user response：

You are modeling the loss function as a linear function of price: Loss = a*Price. The fact that the coefficient a involves an exponent of some number does not seem to be relevant. One way you can try to find the value of a is using linear regression:

import numpy as np
from sklearn.linear_model import LinearRegression

price = np.array([4.00, 10.00, 20.00, 44.00, 66.00, 7.00, 9.00])
loss = np.array([4.00, 9.00, 16.00, 35.00, 56.00, 5.00, 8.00])
reg = LinearRegression(fit_intercept=False).fit(price.reshape(-1, 1), loss)
print(reg.coef_)

This gives:

[0.83150764]

Thus according to the linear regression model you should use a = 0.83150764. The value you selected manually is 0.92*np.exp(-0.07) = 0.8578023, so it will give a slightly higher predicted loss. Using the coefficient coming from the linear regression the predicted losses for the given prices are

[ 3.32603056  8.31507639 16.63015278 36.58633612 54.87950418  5.82055347
  7.48356875]

The sum of these predictions is 133.04, quite close to the actual total loss. However, these predictions use the same data that was used to train the model, so this is not a good measure of its accuracy. You would need to test this model on additional data to see how it performs.

Finally, if you really want to represent the coefficient a in the form 0.92*exp(b) for some value b, then just compute b = np.log(a/0.92). For a = 0.83150764 this gives b = -0.101133183.