I have a sample of the dataframe as shown below,
Months Price Predicted Loss Actual Loss Diff
1/11/2019 4.00 3.43 4.00 0.57
1/11/2019 10.00 8.58 9.00 0.42
1/11/2019 20.00 17.16 16.00 -1.16
1/12/2019 44.00 37.74 35.00 -2.74
1/12/2019 66.00 56.61 56.00 -0.61
1/12/2019 7.00 6.00 5.00 -1.00
1/12/2019 9.00 7.72 8.00 0.28
...
Total 137.25 133.00
The way I am calculating the predcited loss as a function of price by using this equation i.e. Predicted Price = (0.92*(EXP(-0.07)*Price))
As you can see below I am overestimating the loss, so what I want to do is I want to calibrate the above equation so as to minimise the overall difference between the Predcited loss and Actual loss.
Currently, what I am doing is I am manually changing the exponential let's say (0.92*(EXP(-0.07)*Price))
to (0.92*(EXP(-0.072)*Price))
and checking whether I get close to overall Actual Loss. I only want to change the exponential part and run my loop till I get an optimum solution which is my Predicted loss closer to my actual loss. Currently, I am working on a code using a list of coefficients and running through the loop. Please let me know what's the best way to achieve this.
CodePudding user response:
What you have is a linear regression model, so it has a closed form solution that you can read in an econometrics textbook. But for your specific question of minimizing the difference between actual and predicted losses, you can find it using a while-loop iteratively. Basically, what we want is to iteratively decrease the power of eps
until the difference between the losses becomes very, very small. Below is one implementation:
actual_loss_sum = df['Actual Loss'].sum()
predicted_loss = df['Predicted Loss']
exponential = -0.07
while predicted_loss.sum() - actual_loss_sum > 0.001:
exponential -= 0.0001
predicted_loss = 0.92 * np.exp(exponential) * df['Price']
df['Predicted Loss'] = predicted_loss
df['Diff'] = df['Actual Loss'] - df['Predicted Loss']
Output:
Months Price Predicted Loss Actual Loss Diff
0 1/11/2019 4.0 3.324811 4.0 0.675189
1 1/11/2019 10.0 8.312027 9.0 0.687973
2 1/11/2019 20.0 16.624054 16.0 -0.624054
3 1/12/2019 44.0 36.572918 35.0 -1.572918
4 1/12/2019 66.0 54.859377 56.0 1.140623
5 1/12/2019 7.0 5.818419 5.0 -0.818419
6 1/12/2019 9.0 7.480824 8.0 0.519176
In this case, df['Predicted Loss'].sum() = 132.99242959917476
and the difference between the sums of the losses is 0.0075704008252444055
.
CodePudding user response:
You are modeling the loss function as a linear function of price: Loss = a*Price
. The fact that the coefficient a
involves an exponent of some number does not seem to be relevant. One way you can try to find the value of a
is using linear regression:
import numpy as np
from sklearn.linear_model import LinearRegression
price = np.array([4.00, 10.00, 20.00, 44.00, 66.00, 7.00, 9.00])
loss = np.array([4.00, 9.00, 16.00, 35.00, 56.00, 5.00, 8.00])
reg = LinearRegression(fit_intercept=False).fit(price.reshape(-1, 1), loss)
print(reg.coef_)
This gives:
[0.83150764]
Thus according to the linear regression model you should use a = 0.83150764
. The value you selected manually is 0.92*np.exp(-0.07) = 0.8578023
, so it will give a slightly higher predicted loss. Using the coefficient coming from the linear regression the predicted losses for the given prices are
[ 3.32603056 8.31507639 16.63015278 36.58633612 54.87950418 5.82055347
7.48356875]
The sum of these predictions is 133.04, quite close to the actual total loss. However, these predictions use the same data that was used to train the model, so this is not a good measure of its accuracy. You would need to test this model on additional data to see how it performs.
Finally, if you really want to represent the coefficient a
in the form 0.92*exp(b)
for some value b
, then just compute b = np.log(a/0.92)
. For a = 0.83150764
this gives b = -0.101133183
.