Home > OS >  Avoiding duplication in R
Avoiding duplication in R

Time:06-25

I am trying to fit a variety of (truncated) probability distributions to the same very thin set of quantiles. I can do it but it seems to require lots of duplication of the same code. Is there a neater way?

I am using this code by Nadarajah and Kotz to generate the pdf of the truncated distributions:

qtrunc <- function(p, spec, a = -Inf, b = Inf, ...)
{
  tt <- p
  G <- get(paste("p", spec, sep = ""), mode = "function")
  Gin <- get(paste("q", spec, sep = ""), mode = "function")
  tt <- Gin(G(a, ...)   p*(G(b, ...) - G(a, ...)), ...)
  return(tt)
}

where spec can be the name of any untruncated distribution for which code in R exists, and the ... argument is used to provide the names of the parameters of that untruncated distribution.

To achieve the best fit I need to measure the distance between the given quantiles and those calculated using arbitrary values of the parameters of the distribution. In the case of the gamma distribution, for example, the code is as follows:

spec <- "gamma"
fit_gamma <- function(x, l = 0,   h = 20, t1 = 5, t2 = 13){
  ct1 <- qtrunc(p = 1/3, spec, a = l, b = h, shape = x[1],rate = x[2])
  ct2 <- qtrunc(p = 2/3, spec, a = l, b = h, shape = x[1],rate = x[2])
  dist <- vector(mode = "numeric", length = 2) 
  dist[1] <- (t1 - ct1)^2
  dist[2] <- (t2- ct2)^2
  return(sqrt(sum(dist)))
}  

where l is the lower truncation, h is the higher and I am given the two tertiles t1 and t2.

Finally, I seek the best fit using optim, thus:

gamma_fit <- optim(par = c(2, 4), 
                fn = fit_gamma, 
                l = l, 
                h = h,
                t1 = t1,
                t2 = t2,
                method = "L-BFGS-B",
                lower = c(1.01, 1.4)

Now suppose I want to do the same thing but fitting a normal distribution instead. The names of the parameters of the normal distribution that I am using in R are mean and sd.

I can achieve what I want but only by writing a whole new function fit_normal that is extremely similar to my fit_gamma function but with the new parameter names used in the definition of ct1 and ct2.

The problem of duplication of code becomes very severe because I wish to try fitting a large number of different distributions to my data.

What I want to know is whether there is a way of writing a generic fit_spec as it were so that the parameter names do not have to be written out by me.

CodePudding user response:

Use x as a named list to create a list of arguments to pass into qtrunc() using do.call().

fit_distro <- function(x, spec, l = 0, h = 20, t1 = 5, t2 = 13){
  args <- c(x, list(spec = spec, a = l, b = h))
  
  ct1 <- do.call(qtrunc, args = c(list(p = 1/3), args))
  ct2 <- do.call(qtrunc, args = c(list(p = 2/3), args))
  dist <- vector(mode = "numeric", length = 2) 
  dist[1] <- (t1 - ct1)^2
  dist[2] <- (t2 - ct2)^2
  return(sqrt(sum(dist)))
}  

This is called as follows, which is the same as your original function.

fit_distro(list(shape = 2, rate = 3), "gamma")
# [1] 13.07425

fit_gamma(c(2, 3))
# [1] 13.07425

This will work with other distributions, for however many parameters they have.

fit_distro(list(mean = 10, sd = 3), "norm")
# [1] 4.08379

fit_distro(list(shape1 = 2, shape2 = 3, ncp = 10), "beta")
# [1] 12.98371
  • Related