Home > Enterprise >  Pass a vector of distribution functions to calculate a mean per case in R
Pass a vector of distribution functions to calculate a mean per case in R

Time:08-18

I have several probability distribution functions defined using the pdqr package. Let say, they are A, B and C:

A <- as_d(function(x)dnorm(x, mean = 3, sd = 1))
B <- as_d(function(x)dnorm(x, mean = 6, sd = 1))
C <- as_d(function(x)dnorm(x, mean = 2, sd = 2))

I have a large data.frame with a vector which has a character describing the appropiate PDF per case in a vector distr, let say:

df <- data.frame(distr = c("A", "C", "A", "B", "B", "A", "C"))

I would like to generate the mean of each PDF per case. Individually this works like this for PDF A:

> pdqr::summ_mean(A)
[1] 3

Now I would like to generate the mean for each case based on the PDF set in distr. This means passing the PDF into pdqr::sum_mean(). I have tried the following with the resulting errors:

> df$distr_mean <- summ_mean(df$distr)
Error: `f` is not pdqr-function. It should be function.
> 
> df$distr_mean <- summ_mean(invoke_map(df$distr))
Error in A() : argument "x" is missing, with no default
> 
> df$distr_mean <- df %>%
    pull(distr) %>%
    summ_mean()
Error: `f` is not pdqr-function. It should be function.

So, either it doesn't understand that a pdqr-function is being passed, or it needs a x-value, which doesn't make sense, since I want the mean over the entire distribution, not a single x (passing a range like c(1:10) also doesn't work). Furthermore, I understand that any apply or do.call function only passes one single function, while I want to pass several different functions, given in a vector.

How to proceed?

CodePudding user response:

One way to do this is to use the distr column as an argument to mget, which will return all the appropriate functions in a list. Just feed that list to summ_mean using sapply:

sapply(mget(df$distr), pdqr::summ_mean)
#> A C A B B A C 
#> 3 2 3 6 6 3 2 

Though inside mutate you'll need to tell mget which environment the functions will be found:

df %>% 
  mutate(distr_mean = sapply(mget(distr, envir = .GlobalEnv), pdqr::summ_mean))
#>   distr distr_mean
#> 1     A          3
#> 2     C          2
#> 3     A          3
#> 4     B          6
#> 5     B          6
#> 6     A          3
#> 7     C          2

CodePudding user response:

This may be easier to manage if you store your functions in a named list, rather than in the top level environment. From there, it's relatively easy to use sapply or lapply to calculate the mean for each function and then extract the results into df:

df <- data.frame(distr = c("A", "C", "A", "B", "B", "A", "C"))

pdfs <- list(
  A = as_d(function(x)dnorm(x, mean = 3, sd = 1)),
  B = as_d(function(x)dnorm(x, mean = 6, sd = 1)),
  C = as_d(function(x)dnorm(x, mean = 2, sd = 2))
)

means <- sapply(pdfs, summ_mean)
df$distr_mean <- means[df$distr]

  distr distr_mean
1     A          3
2     C          2
3     A          3
4     B          6
5     B          6
6     A          3
7     C          2

Or in one line:

df$distr_mean <- lapply(df$distr, \(x) pdqr::summ_mean(pdfs[[x]]))
  • Related