Home > database >  Compute stats for several columns at the same time using sapply
Compute stats for several columns at the same time using sapply

Time:04-26

I have a dataframe as follows:

# A tibble: 6 x 4
   Placebo    High  Medium      Low
     <dbl>   <dbl>   <dbl>    <dbl>
1  0.0400  -0.04    0.0100  0.0100 
2  0.04     0      -0.0100  0.04   
3  0.0200  -0.1    -0.05   -0.0200 
4  0.03    -0.0200  0.03   -0.00700
5 -0.00500 -0.0100  0.0200  0.0100 
6  0.0300  -0.0100 NA      NA  

You could get the cohensD for two of the columns using the cohen.d() function from the effsize package:

df <- data.frame(Placebo = c(0.0400, 0.04, 0.0200, 0.03, -0.00500, 0.0300),
                 Low = c(-0.04, 0, -0.1, -0.0200,  -0.0100, -0.0100),
                 Medium = c(0.0100, -0.0100, -0.05, 0.03,  0.0200, NA ),
                 High = c(0.0100, 0.04, -0.0200, -0.00700, 0.0100, NA))

library(effsize)
cohen.d(as.vector(na.omit(df$Placebo)), as.vector(na.omit(df$High)))

Interestingly enough, I'm getting the following error with this code:

Error in data[, group] : incorrect number of dimensions

However, I would like to create a function that allows you to obtain all the cohensd between one of the columns and the rest of them.

In order to get the cohensD of all columns against the Placebo we would use something like:

sapply(df, function(i) cohen.d(pull(df, as.vector(na.omit(!!Placebo))), as.vector(na.omit(i))))

But I'm not sure this would work anyway.

Edit: I don't want to erase the full row, as cohens d can be computed for different length vectors. Ideally, I would like to get the stat with the NA removed for each column independetly

CodePudding user response:

It may be better to remove the NA on each of the columns separately by creating a logical index along with 'Placebo'

library(dplyr)
library(effsize)
df %>%   
  summarise(across(Low:High, ~ list({
             i1 <- complete.cases(Placebo)& complete.cases(.x)
             cohen.d(Placebo[i1], .x[i1])})))

Or if we want to use lapply/sapply, loop over the columns other than Placebo

lapply(df[-1], function(x) {
          x1 <- na.omit(cbind(df$Placebo, x))
          cohen.d(x1[,1], x1[,2])
})

-output

$Low

Cohen's d

d estimate: 1.947312 (large)
95 percent confidence interval:
    lower     upper 
0.3854929 3.5091319 


$Medium

Cohen's d

d estimate: 0.9622504 (large)
95 percent confidence interval:
     lower      upper 
-0.5782851  2.5027860 


$High

Cohen's d

d estimate: 0.8884639 (large)
95 percent confidence interval:
     lower      upper 
-0.6402419  2.4171697 
  • Related