Utilizing loop for repetitive analyses and outputs using R-CodePudding

I have a dataset that contains numerous items that were measured using a pre- and posttest instrument. Here is an example dataset:

Question    Score   Test
  QA         5       Pre
  QA         2       Pre
  QA         3       Post
  QA         7       Post
  QA         3       Post
  QB         2       Pre
  QB         1       Pre
  QB         4       Pre
  QC         7       Pre
  QC         3       Pre
  QC         2       Post
  QC         3       Post
  QC         6       Post

I want to perform a Cohen's D on this data, and create an object in my data environment, such as:

Effectsize1<-effectsize::cohens_d(df$Score[df$Question== "QA"]~ df$Test[df$Question== "QA"], data = df)

instead of writing out this code for each item, I have tried to perform this using a loop:

questions<-as.data.frame(unique(df$Questions))
er<-NULL

i for (1:rnow(questions)){
 er$i<-effectsize::cohens_d(df$Score[df$Question== i] ~ df$Test[df$Question== i] data = df)
print(er$i)
}

I am not sure if I am close, or far off. Any help is much appreciated. Thanks so much!

CodePudding user response：

If d is your data:

library(data.table)
setDT(d)[, effectsize::cohens_d(Score~Test), Question]

Output:

   Question   Cohens_d    CI     CI_low  CI_high
     <char>      <num> <num>      <num>    <num>
1:       QA  0.3706247  0.95 -1.4690832 2.153623
2:       QB  1.9611614  0.95 -0.7822302 4.510194
3:       QC -0.5656854  0.95 -2.3643683 1.316452

Input:

d = data.table::fread("Question    Score   Test
  QA         5       Pre
  QA         2       Pre
  QA         3       Post
  QA         7       Post
  QA         3       Post
  QB         2       Pre
  QB         1       Pre
  QB         4       Post
  QB         9       Post
  QC         7       Pre
  QC         3       Pre
  QC         2       Post
  QC         3       Post
  QC         6       Post")

CodePudding user response：

You don't need a loop, you could do it all with tidy functions:

library(dplyr)
library(tidyr)
dat <- tibble::tribble(
  ~Question,    ~Score,   ~Test,
"QA",         5,       "Pre",
"QA",         2,       "Pre",
"QA",         3,       "Post",
"QA",         7,       "Post",
"QA",         3,       "Post",
"QB",         2,       "Pre",
"QB",         1,       "Pre",
"QB",         4,       "Pre",
"QB",         3,       "Post",
"QB",         5,       "Post",
"QC",         7,       "Pre",
"QC",         3,       "Pre",
"QC",         2,       "Post",
"QC",         3,       "Post",
"QC",         6,       "Post")


dat %>% 
  group_by(Question) %>%
  summarise(d = effectsize::cohens_d(Score ~ Test)) %>% 
  unnest(d)
#> Warning: 'y' is numeric but has only 2 unique values.
#> If this is a grouping variable, convert it to a factor.

#> Warning: 'y' is numeric but has only 2 unique values.
#> If this is a grouping variable, convert it to a factor.
#> # A tibble: 3 × 5
#>   Question Cohens_d    CI CI_low CI_high
#>   <chr>       <dbl> <dbl>  <dbl>   <dbl>
#> 1 QA          0.371  0.95 -1.47     2.15
#> 2 QB          1.12   0.95 -0.933    3.03
#> 3 QC         -0.566  0.95 -2.36     1.32

^{Created on 2022-07-14 by the reprex package (v2.0.1)}

If you wanted to do the loop instead, you could do it this way:

questions<-data.frame(q = unique(dat$Question))
er<-vector(mode="list", length=nrow(questions))
names(er) <- questions$q
for(i in questions$q){
  er[[i]]<-effectsize::cohens_d(dat$Score[dat$Question== i] ~ dat$Test[dat$Question== i])
}
#> Warning: 'y' is numeric but has only 2 unique values.
#> If this is a grouping variable, convert it to a factor.

#> Warning: 'y' is numeric but has only 2 unique values.
#> If this is a grouping variable, convert it to a factor.
er
#> $QA
#> Cohen's d |        95% CI
#> -------------------------
#> 0.37      | [-1.47, 2.15]
#> 
#> - Estimated using pooled SD.
#> $QB
#> Cohen's d |        95% CI
#> -------------------------
#> 1.12      | [-0.93, 3.03]
#> 
#> - Estimated using pooled SD.
#> $QC
#> Cohen's d |        95% CI
#> -------------------------
#> -0.57     | [-2.36, 1.32]
#> 
#> - Estimated using pooled SD.

Here, the loop counter i stands in for the question names (i.e., it is a string and must be used as any string can be used in R. We can initialize the er object as a list with the right number of elements and then can name the elements according to the questions. Now, inside the loop, when you use I it will have the values "QA", "QB" and "QC" as it moves through the loop.

^{Created on 2022-07-14 by the reprex package (v2.0.1)}

CodePudding user response：

The correct syntax for a for loop is:

for (i in 1:rnow(questions)) {

# code here

}