R optimise log-likelihood-CodePudding

I have a function that takes in lambda and a data sample, and finds the corresponding log-likelihood value for each data point:

data <- rpois(n=25, lambda=4)
generateLogLikelihood <- function(lambda, y){
  return(dpois(y, lambda, log=TRUE))
}

LogLikelihood = generateLogLikelihood (4,data)
LogLikelihood

I'm asking for a solution to fit this requirement:

I need to use the optimise function to return the value of lambda which maximises the log-likelihood for the given data sample, but I'm getting stuck (this is my first attempt at using optimise). I need to adapt the function to only take 'data' as an input.

Where I'm getting stuck: I'm not sure how/when to apply optimise... given the requirement that my function only takes one input (the data - now called newdata), do I need to optimise outside of the function (but my function requires lambda values), so I'm not sure how to do this.

My current code, which represents 2 separate parts that I don't know how to combine (or may be entirely wrong), is below:

newdata <- c(23,16,18,14,19,20,12,15,15,21)
newlambdas <- seq(min(newdata),max(newdata),0.5)

generateLogLikelihoodNew <- function(y){
  return(dpois(y, lambda, log=TRUE))
}

LogLikelihood = optimise(generateLogLikelihoodNew,newdata,lower = min(newlambdas), upper = max(newlambdas), maximum = TRUE)
LogLikelihood

CodePudding user response：

If you only want to check which of the provided lambda's returns the best fit you can do

generateLogLikelihoodNew <- function(y){
  -sum(dpois(newdata, y, log=TRUE))
}
whihc.min(lapply(newlambdas,generateLogLikelihoodNew))

If however you want to find such value of lambda then you do not need to provide a lambda sequence vector

optimise(
  function(x){-sum(dpois(newdata,x,log=TRUE))},
  c(0,100)
)

$minimum
[1] 17.3

$objective
[1] 26.53437

CodePudding user response：

There are several problems here:

the log likelihood function defined in the question is only valid for a scalar y value. The log likelihood function for a vector y is the sum of the log likelihoods of the individual y values. Add sum to the definition.
the default for optimize is to minimize but to use the log likelihood as an objective we need to maximize so specify maximum=TRUE as an argument to optimize (or else pass the negative log likelihood function).
y needs to be passed to the log likelihood function. This can be done by specifying it as an argument to optimize.
Although it is not wrong to specify lower and upper as done in the question it is a bit shorter to pass range(newdata) to the interval argument of optimize.
although using a long name such as generateLogLikelihood is not wrong it makes it hard to read and can make the code run off the end. The word generate really adds nothing. I would choose a better name. Often for scientific code it is read in conjunction with its mathematical formula. Suppose that in this case the formula used ll or LL. ll is a bit hard to read since a lower case L and a one look nearly the same so we could use LL or if you really want to use write it out shorten it to logLikelihood. Furthermore, the variable named logLikelihood in the code is not the log likelihood. It is a list consisting of two components which represent the value of lambda and the the objective at the optimum. Clearly there is a certain amount of discretion in choosing names and your opinion may differ from mine but I found it awkward dealing with such long variable names.

Thus we have:

LL <- function(lambda, y) sum(dpois(y, lambda, log = TRUE))
optimize(LL, range(newdata), y = newdata, maximum = TRUE)