Why does scipy minimize return such bad estimates?-CodePudding

I'm trying to use scipy minimize to estimate a parameter from an ODE system, which is pretty straightforward, however the methods I used aren't returning values anywhere near what the value should be. My parameter beta, should have a value estimated to be around 0.42. I am sure that this method is correct, so i can't understand why the estimates are so off

import numpy as np
from scipy.integrate import odeint
from scipy.optimize import minimize
from scipy.optimize import minimize_scalar
import pandas as pd
from scipy.optimize import leastsq

t = np.linspace(0, 77, 77 1)
d = {'Week': [t[7],t[14],t[21],t[28],t[35],t[42],t[49],t[56],t[63],t[70],t[77]], 
     'incidence': [206.1705794,2813.420201,11827.9453,30497.58655,10757.66954,
                   7071.878779,3046.752723,1314.222882,765.9763902,201.3800578,109.8982006]}
df = pd.DataFrame(data=d)

def peak_infections(beta, df):
 
    # Weeks for which the ODE system will be solved
    #weeks = df.Week.to_numpy()

    # Total population, N.
    N = 100000
    # Initial number of infected and recovered individuals, I0 and R0.
    I0, R0 = 10, 0
    # Everyone else, S0, is susceptible to infection initially.
    S0 = N - I0 - R0
    J0 = I0
    # Contact rate, beta, and mean recovery rate, gamma, (in 1/days).
    #reproductive no. R zero is beta/gamma
    gamma = 1/6 #rate should be in weeks now
    # A grid of time points 
    times = np.arange(7,84,7)

    # The SIR model differential equations.
    def deriv(y, times, N, beta, gamma):
        S, I, R, J = y
        dS = ((-beta * S * I) / N)
        dI = ((beta * S * I) / N) - (gamma * I)
        dR = (gamma * I)
        dJ = ((beta * S * I) / N) #incidence
        return dS, dI, dR, dJ

    # Initial conditions are S0, I0, R0
    # Integrate the SIR equations over the time grid, t.
    solve = odeint(deriv, (S0, I0, R0, J0), times, args=(N, beta, gamma))
    S, I, R, J = solve.T

    return I/N

def residual(x, df):

    # Total population,  N.
    N = 100000
    incidence = df.incidence.to_numpy()/N
    return np.sum((peak_infections(x,df) - incidence) ** 2)

x0 = 0.5
res = minimize(residual, x0, args=(df), method="Nelder-Mead", options={'fatol':1e-04}).x
print(res)

best = leastsq(residual, x0,args=(df))
print(best) #tried this using leastsq too

results = minimize_scalar(residual,(0.4, 0.5),args=(df))
print(results)
results['fun']

As you can see I have used minimize, minimize_scalar and even leastsq. They all returns values such as 0.723. Where have I gone wrong? Is my objective function return np.sum((peak_infections(x,df) - incidence) ** 2) at least correct?

edit: i tried taking the np.max(I/N) in the peak_infections(beta, df) function however this does not return a correct estimate either

CodePudding user response：

Assuming that the model is OK, the problem is that you assume that beta should be close to 0.42 when it should not. A simple visual test, plotting the measurements and the modeled data, shows that 0.72 yields much better results than 0.43. I added the following lines:

import matplotlib.pyplot as plt
plt.plot(d['Week'], df.incidence.to_numpy()/100000, label="Real data")
plt.plot(d['Week'], peak_infections(.72, df), label="Model with 0.72")
plt.plot(d['Week'], peak_infections(.42, df), label="Model with .42")
plt.legend()

which yielded the following figure:

It is clear that 0.72 is a better estimate of beta than 0.42, even without plotting the residuals.

As a side note, be careful because your code is easy to break.You define the same values in many places: N is defined inside peak_infections and residual. The days of the week are defined as d['Week'] and as times inside peak_infections. It is easy to change the value of one of this variables in one place and forget to change it in another place. Moreover, I see no use of pandas here, since you do not take profit of its features on top of numpy.