Error in optim R - cannot be evaluated at initial parameters-CodePudding

I'm trying to estimate the parameter a, b, c, and s by using optim in R. Here is my code.

age <- c(0,30,60,90)
Dx <- c(49294.57, 2975.1, 11456.38, 2977.08)
Ex <- c(1572608.38, 1531956.05, 650404.58, 9728.47)

log_lik <- function(par,x,y,z){
  a <- par[1]
  b <- par[2]
  c <- par[3]
  s <- par[4]
  mu <- (a*exp(b*x))/(1 s * (a)/(b) * (exp(b*x)-1))   c
  lambda <- mu * z
  
  lnL <- sum(y*log(lambda) - log(factorial(y)) - lambda)
  -lnL
}

optim(c(1,1,1,1),log_lik, x = age, y = Dx, z = Ex)

But, I get an error

Error in optim(c(1, 1, 1, 1), log_lik, x = age, y = Dx, z = Ex) : 
  function cannot be evaluated at initial parameters

I have tried several initial values, but still get the same error. Can you solve this problem? Or maybe there is another code to solve this problem?

Thank you.

CodePudding user response：

The problem comes from calculating the factorial of a large number then taking its log. The factorial number is too high for R to recognise as a finite number, but its log is not. In this situation, we can get the same result as log(factorial(y)) by using the lgamma function.

This is not a hack; the factorial function in R is just a thin wrapper for the gamma function:

factorial
#> function (x) 
#> gamma(x   1)

So we can get a function that does the same as log(factorial(y)) without the need to actually go through the step of calculating extremely high numbers then taking their log, like this:

log_factorial <- function(x) lgamma(x   1)

Which we can see gives us the correct results:

log(factorial(21))
#> [1] 45.38014

log_factorial(21)
#> [1] 45.38014

But allows us to input higher numbers without generating infinities.

log(factorial(200))
#> [1] Inf

log_factorial(200)
#> [1] 863.232

So we can change your code slightly to:

log_lik <- function(par,x,y,z){
  a <- par[1]
  b <- par[2]
  c <- par[3]
  s <- par[4]
  mu <- (a*exp(b*x))/(1 s * (a)/(b) * (exp(b*x)-1))   c
  lambda <- mu * z
  
  lnL <- sum(y*log(lambda) - lgamma(y   1) - lambda)
  -lnL
}

And now we get:

optim(c(1,1,1,1), log_lik, x = age, y = Dx, z = Ex)
#> $par
#> [1]  0.6114036  1.1267546 -0.5800334  1.9163744
#> 
#> $value
#> [1] 15828.8
#> 
#> $counts
#> function gradient 
#>      161       NA 
#> 
#> $convergence
#> [1] 0

$message
NULL

CodePudding user response：

The optimization cannot be done because you have very large values, which results in infinite or NA values. One option could be to rescale your variables, for ex if your variable is naturally in the range around 1 million, divide all the values by 1 million. For ex.

age=age/1e2
Dx=Dx/1e5
Ex=Ex/1e6

now the optimization works and returns

$par
[1]  1.418161 37.235806 -1.104942 31.443860

$value
[1] 1.421373

$counts
function gradient 
     479       NA 

$convergence
[1] 0

$message
NULL

Warning messages:
1: In log(lambda) : NaNs produced
2: In log(lambda) : NaNs produced
3: In log(lambda) : NaNs produced
4: In log(lambda) : NaNs produced
5: In log(lambda) : NaNs produced
6: In log(lambda) : NaNs produced
7: In log(lambda) : NaNs produced
8: In log(lambda) : NaNs produced
9: In log(lambda) : NaNs produced

there are still issues with the log(lambda) part, since it can happen that lambda is negative, which is an issue. You may have to use constrained optimization to solve this.

CodePudding user response：

Note, that the value of lambda that maximises

  lnL <- sum(y*log(lambda) - log(factorial(y)) - lambda)

is the same value that maximises

  lnL_2 <- sum(y*log(lambda) - lambda)

so you can optimise lnL_2 instead of lnL. See, e.g, this answer over at Math Stackexchange for derivation.