Home > database >  How to call multiple distribution functions from different vectors into a function in R
How to call multiple distribution functions from different vectors into a function in R

Time:08-24

Lets talk you through my workflow:

General idea

Based on data in a dataframe, select the appropriate distribution functions, combine them in all possible ways to get the mean of the combined distributions.

Starting position

  • I have a large data frame df. In there I have different variables var1, var2 and var3 in this example which contains data to select the appropriate distribution function.
  • I have several distribution functions per variable:
var1_distr1 <- pdqr::as_d(function(x)dnorm(x, mean = 3, sd = 1))
var1_distr2 <- pdqr::as_d(function(x)dnorm(x, mean = 6, sd = 1))
var1_distr3 <- pdqr::as_d(function(x)dnorm(x, mean = 2, sd = 2))

var2_distr1 <- pdqr::as_d(function(x)dnorm(x, mean = 5, sd = 3))
var2_distr2 <- pdqr::as_d(function(x)dnorm(x, mean = 3, sd = 1))
var2_distr3 <- pdqr::as_d(function(x)dnorm(x, mean = 4, sd = 2))

var3_distr1 <- pdqr::as_d(function(x)dnorm(x, mean = 4, sd = 1))
var3_distr2 <- pdqr::as_d(function(x)dnorm(x, mean = 5, sd = 1))
var3_distr3 <- pdqr::as_d(function(x)dnorm(x, mean = 7, sd = 2))

Select the right distribution

Using an if_else on each of the vars I generate the appropriate distribution per case in a new vector. The if_else looks like this for var1 and has the same appearance for all vars:

df$distr_var1 <- if_else(df$info < 0, "var1_distr1",
                         if_else(df$info > 0 & df$info < 100, "var1_distr2", "var1_distr3")

This results in the following df:

df <- data.frame(distr_var1 = c("var1_distr1", "var1_distr3", "var1_distr1", "var1_distr2", "var1_distr2", "var1_distr1", "var1_distr3"),
                 distr_var2 = c("var2_distr2", "var2_distr1", "var2_distr2", "var2_distr1", "var2_distr3", "var2_distr3", "var2_distr1"),
                 distr_var3 = c("var3_distr2", "var3_distr3", "var3_distr1", "var3_distr1", "var3_distr2", "var3_distr3", "var3_distr1"))

Combine distribution functions

To combine distribution functions in a new proportional distribution function I have created this function based on this question:

foo <- function(...){
  #set x values
  x <- seq(1, 10, by = 1)
  #create y values
  y <- 1L
  for (fun in list(...)) y <- y * fun(x)
  #create new PDF
  p <- data.frame(x,y)
  pdqr::new_d(p, type = "continuous")
}

And I have stored the PDFs in a list:

PDFS <- list(var1_distr1 = var1_distr1, var1_distr2 = var1_distr2, var1_distr3 = var1_distr3,
             var2_distr1 = var2_distr1, var2_distr2 = var2_distr2, var2_distr3 = var2_distr3,
             var3_distr1 = var3_distr1, var3_distr2 = var3_distr2, var3_distr3 = var3_distr3)

I would like to use the function foo in the df to generate proportional distributions for all combinations of distributions given in the df. So, for each case, a the following combinations: var1_var2, var1_var3, var2_var3, var1_var2_var3.

Calculate mean over distributions

If I want to calculate a mean over the distributions individually, I can do this:

means <- sapply(PDFS, pdqr::summ_mean)
df$mean_var1 <- means[df$distr_var1]

Or:

df$mean_var2 <- sapply(mget(df$distr_var2), pdqr::summ_mean)

Both approaches work fine. But on the combinations var1_var2, var1_var3, var2_var3, var1_var2_var3 I have not found a suitable approach, but tried these:

df$var1_var2_mean <- sapply(foo(mget(mapply(PDFS, sapply, df$distr_var1, df$distr_var2))), pdqr::summ_mean)

I tried to overcome not calling functions by using a list, but things seem to get too complicated / nested to work nicely...

Question

How to select the appropriate distributions given in distr_var1, distr_var2 and distr_var3, combined them using foo and calculate the mean using pdqr::summ_mean?

I'm happy with all comments, also on the workflow in general

CodePudding user response:

A foreach loop works for me:

df$var1_var2_mean <- foreach(i = 1:nrow(df), .combine = c) %do% {
  A <- as.name(df$var1[i])
  B <- as.name(df$var2[i])
  mean <- summ_mean(foo(get(A),get(B)))
}

And, for each combination I need to do this. At least I got it working...

  • Related