I have a dataframe df
with the following observations:
a <- c("A", "A", "A", "A", "B", "B","B", "B")
b <- c(11, 9, 4, 1, NA, 2,3,4)
c <- c(2,3, NA, NA, 25, 4, NA, 2)
d <- c(4,5, 3, NA, NA, 2,NA,NA)
df <- data.frame(a, b,c,d)
df
df <- data.frame(df)
colnames(df) <- c("Letter", "num1", "num2", "num3")
df
Now, I would like to do my calculation with the first column with the three other columns at by using cohen.d
function from effsize
package, e.g: cohen.d(df$num1, df$Letter) or cohen.d(df$num2, df$Letter
). However, before doing that, I need to remove NA values for each numerical column each calculation. The idea that pops up in my mind is I will run a for loop
through columns num1
, num2
, and num3
with num1
. How can I use a for loop
for calcultions in this case?
CodePudding user response:
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
The following code reshapes the data, pipes to na.omit
, then split/lapply/combine and put the results in a data.frame format.
a <- c("A", "A", "A", "A", "B", "B","B", "B")
b <- c(11, 9, 4, 1, NA, 2,3,4)
c <- c(2,3, NA, NA, 25, 4, NA, 2)
d <- c(4,5, 3, NA, NA, 2,NA,NA)
df <- data.frame(a, b,c,d)
colnames(df) <- c("Letter", "num1", "num2", "num3")
faux <- function(x){
e <- effsize::cohen.d(value ~ Letter, data = x)
e2 <- unclass(e)
c(e2[1:4],
lower = unname(e2$conf.int[1]),
upper = unname(e2$conf.int[2]),
e2[6:8])
}
long <- reshape2::melt(df, id.vars = "Letter") |> na.omit()
res <- lapply(split(long, long$variable), faux)
do.call(rbind.data.frame, res)
#> method name estimate sd lower upper var conf.level magnitude
#> num1 Cohen's d d 0.9031263 3.598611 -1.155897 2.962150 0.6415931 0.95 large
#> num2 Cohen's d d -0.7524094 10.410998 -3.754631 2.249812 0.8899453 0.95 medium
#> num3 Cohen's d d NA NA NA NA NA 0.95 <NA>
Created on 2022-07-28 by the reprex package (v2.0.1)
Edit
To run the code above as a for
loop, assign the result of split
, explicitly create a results vector and call faux
(auxiliary function) in the loop.
sp <- split(long, long$variable)
res <- vector("list", length = length(sp))
for(i in seq_along(sp)) {
res[[i]] <- faux(sp[[i]])
}
do.call(rbind.data.frame, res)
#> method name estimate sd lower upper var conf.level magnitude
#> 1 Cohen's d d 0.9031263 3.598611 -1.155897 2.962150 0.6415931 0.95 large
#> 2 Cohen's d d -0.7524094 10.410998 -3.754631 2.249812 0.8899453 0.95 medium
#> 3 Cohen's d d NA NA NA NA NA 0.95 <NA>
Created on 2022-07-28 by the reprex package (v2.0.1)
CodePudding user response:
With your data, I have all NA's for cohen estimates and CI's.
However, the below is a way to have all the results at once in a list
.
First, let's filter out NA values
df <- df %>% filter(!is.na(b)&!is.na(c)&!is.na(d))
Then, run the loop
mycols <- letters[2:4]
lapply(newcols, function(x) effsize::cohen.d(df[,x], df$a) )
[[1]]
Cohen's d
d estimate: NA (NA)
95 percent confidence interval:
lower upper
NA NA
[[2]]
Cohen's d
d estimate: NA (NA)
95 percent confidence interval:
lower upper
NA NA
[[3]]
Cohen's d
d estimate: NA (NA)
95 percent confidence interval:
lower upper
NA NA
This lapply
function is nothing else than an (implicit) loop which returns the results into a list
.
CodePudding user response:
First, to remove the NA values you can use tidyr::drop_na()
this will remove any row with an NA value. Then the easiest loop is via the column names you are interested in. So just create a vector of these and use purrr::map
to iterate over each.
df <- data.frame(
Letter = c("A", "A", "A", "A", "B", "B","B", "B"),
num1 = c(11, 9, 4, 1, NA, 2,3,4),
num2 = c(2,3, NA, NA, 25, 4, NA, 2),
num3 = c(4,5, 3, NA, NA, 2,NA,NA)) |>
tidyr::drop_na()
purrr::map(c('num1', 'num2', 'num3'),
~ effsize::cohen.d(df[[.x]], df$Letter))