I'm trying to estimate the parameter a, b, c, and s by using optim in R. Here is my code.
age <- c(0,30,60,90)
Dx <- c(49294.57, 2975.1, 11456.38, 2977.08)
Ex <- c(1572608.38, 1531956.05, 650404.58, 9728.47)
log_lik <- function(par,x,y,z){
a <- par[1]
b <- par[2]
c <- par[3]
s <- par[4]
mu <- (a*exp(b*x))/(1 s * (a)/(b) * (exp(b*x)-1)) c
lambda <- mu * z
lnL <- sum(y*log(lambda) - log(factorial(y)) - lambda)
-lnL
}
optim(c(1,1,1,1),log_lik, x = age, y = Dx, z = Ex)
But, I get an error
Error in optim(c(1, 1, 1, 1), log_lik, x = age, y = Dx, z = Ex) :
function cannot be evaluated at initial parameters
I have tried several initial values, but still get the same error. Can you solve this problem? Or maybe there is another code to solve this problem?
Thank you.
CodePudding user response:
The problem comes from calculating the factorial of a large number then taking its log. The factorial number is too high for R to recognise as a finite number, but its log is not. In this situation, we can get the same result as log(factorial(y))
by using the lgamma
function.
This is not a hack; the factorial
function in R is just a thin wrapper for the gamma
function:
factorial
#> function (x)
#> gamma(x 1)
So we can get a function that does the same as log(factorial(y))
without the need to actually go through the step of calculating extremely high numbers then taking their log, like this:
log_factorial <- function(x) lgamma(x 1)
Which we can see gives us the correct results:
log(factorial(21))
#> [1] 45.38014
log_factorial(21)
#> [1] 45.38014
But allows us to input higher numbers without generating infinities.
log(factorial(200))
#> [1] Inf
log_factorial(200)
#> [1] 863.232
So we can change your code slightly to:
log_lik <- function(par,x,y,z){
a <- par[1]
b <- par[2]
c <- par[3]
s <- par[4]
mu <- (a*exp(b*x))/(1 s * (a)/(b) * (exp(b*x)-1)) c
lambda <- mu * z
lnL <- sum(y*log(lambda) - lgamma(y 1) - lambda)
-lnL
}
And now we get:
optim(c(1,1,1,1), log_lik, x = age, y = Dx, z = Ex)
#> $par
#> [1] 0.6114036 1.1267546 -0.5800334 1.9163744
#>
#> $value
#> [1] 15828.8
#>
#> $counts
#> function gradient
#> 161 NA
#>
#> $convergence
#> [1] 0
$message
NULL
CodePudding user response:
The optimization cannot be done because you have very large values, which results in infinite or NA values. One option could be to rescale your variables, for ex if your variable is naturally in the range around 1 million, divide all the values by 1 million. For ex.
age=age/1e2
Dx=Dx/1e5
Ex=Ex/1e6
now the optimization works and returns
$par
[1] 1.418161 37.235806 -1.104942 31.443860
$value
[1] 1.421373
$counts
function gradient
479 NA
$convergence
[1] 0
$message
NULL
Warning messages:
1: In log(lambda) : NaNs produced
2: In log(lambda) : NaNs produced
3: In log(lambda) : NaNs produced
4: In log(lambda) : NaNs produced
5: In log(lambda) : NaNs produced
6: In log(lambda) : NaNs produced
7: In log(lambda) : NaNs produced
8: In log(lambda) : NaNs produced
9: In log(lambda) : NaNs produced
there are still issues with the log(lambda)
part, since it can happen that lambda is negative, which is an issue. You may have to use constrained optimization to solve this.
CodePudding user response:
Note, that the value of lambda that maximises
lnL <- sum(y*log(lambda) - log(factorial(y)) - lambda)
is the same value that maximises
lnL_2 <- sum(y*log(lambda) - lambda)
so you can optimise lnL_2 instead of lnL. See, e.g, this answer over at Math Stackexchange for derivation.