I have a dataframe df with the following observations:

a <- c("A", "A", "A", "A", "B", "B","B", "B")
b <- c(11, 9, 4, 1, NA, 2,3,4)
c <- c(2,3, NA, NA, 25, 4, NA, 2)
d <- c(4,5, 3, NA, NA, 2,NA,NA)

df <- data.frame(a, b,c,d)
df
df <- data.frame(df)
colnames(df) <- c("Letter", "num1", "num2", "num3")
df

Now, I would like to do my calculation with the first column with the three other columns at by using cohen.d function from effsize package, e.g: cohen.d(df$num1, df$Letter) or cohen.d(df$num2, df$Letter). However, before doing that, I need to remove NA values for each numerical column each calculation. The idea that pops up in my mind is I will run a for loop through columns num1, num2, and num3 with num1. How can I use a for loop for calcultions in this case?

CodePudding user response：

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

The following code reshapes the data, pipes to na.omit, then split/lapply/combine and put the results in a data.frame format.

a <- c("A", "A", "A", "A", "B", "B","B", "B")
b <- c(11, 9, 4, 1, NA, 2,3,4)
c <- c(2,3, NA, NA, 25, 4, NA, 2)
d <- c(4,5, 3, NA, NA, 2,NA,NA)

df <- data.frame(a, b,c,d)
colnames(df) <- c("Letter", "num1", "num2", "num3")

faux <- function(x){
  e <- effsize::cohen.d(value ~ Letter, data = x)
  e2 <- unclass(e)
  c(e2[1:4], 
    lower = unname(e2$conf.int[1]), 
    upper = unname(e2$conf.int[2]), 
    e2[6:8])
}

long <- reshape2::melt(df, id.vars = "Letter") |> na.omit()
res <- lapply(split(long, long$variable), faux)
do.call(rbind.data.frame, res)
#>         method name   estimate        sd     lower    upper       var conf.level magnitude
#> num1 Cohen's d    d  0.9031263  3.598611 -1.155897 2.962150 0.6415931       0.95     large
#> num2 Cohen's d    d -0.7524094 10.410998 -3.754631 2.249812 0.8899453       0.95    medium
#> num3 Cohen's d    d         NA        NA        NA       NA        NA       0.95      <NA>

^{Created on 2022-07-28 by the reprex package (v2.0.1)}

Edit

To run the code above as a for loop, assign the result of split, explicitly create a results vector and call faux(auxiliary function) in the loop.

sp <- split(long, long$variable)
res <- vector("list", length = length(sp))
for(i in seq_along(sp)) {
  res[[i]] <- faux(sp[[i]])
}
do.call(rbind.data.frame, res)
#>      method name   estimate        sd     lower    upper       var conf.level magnitude
#> 1 Cohen's d    d  0.9031263  3.598611 -1.155897 2.962150 0.6415931       0.95     large
#> 2 Cohen's d    d -0.7524094 10.410998 -3.754631 2.249812 0.8899453       0.95    medium
#> 3 Cohen's d    d         NA        NA        NA       NA        NA       0.95      <NA>

^{Created on 2022-07-28 by the reprex package (v2.0.1)}

CodePudding user response：

With your data, I have all NA's for cohen estimates and CI's.

However, the below is a way to have all the results at once in a list.

First, let's filter out NA values

df <- df %>% filter(!is.na(b)&!is.na(c)&!is.na(d))

Then, run the loop

mycols <- letters[2:4]
lapply(newcols, function(x) effsize::cohen.d(df[,x], df$a) )

[[1]]

Cohen's d

d estimate: NA (NA)
95 percent confidence interval:
lower upper 
   NA    NA 


[[2]]

Cohen's d

d estimate: NA (NA)
95 percent confidence interval:
lower upper 
   NA    NA 


[[3]]

Cohen's d

d estimate: NA (NA)
95 percent confidence interval:
lower upper 
   NA    NA

This lapply function is nothing else than an (implicit) loop which returns the results into a list.

CodePudding user response：

First, to remove the NA values you can use tidyr::drop_na() this will remove any row with an NA value. Then the easiest loop is via the column names you are interested in. So just create a vector of these and use purrr::map to iterate over each.

df <- data.frame(
  Letter = c("A", "A", "A", "A", "B", "B","B", "B"),
  num1 = c(11, 9, 4, 1, NA, 2,3,4),
  num2 = c(2,3, NA, NA, 25, 4, NA, 2),
  num3 = c(4,5, 3, NA, NA, 2,NA,NA)) |>
  tidyr::drop_na() 

purrr::map(c('num1', 'num2', 'num3'),
           ~ effsize::cohen.d(df[[.x]], df$Letter))