Home > Net >  Loop to make a basic table for many variables by condition
Loop to make a basic table for many variables by condition

Time:04-21

I am running an experiment where participants are randomly assigned to one of two conditions, and then I collect data on several variables. Here is an example of my code:

df <- data.frame(condition =c(1,1,1,1,1,-1,-1,-1,-1,-1),
              var1 = c(6,6,4,7,5,6,6,6,4,7),
              var2 = c(3,4,3,6,7,1,2,1,2,5),
              var3 = c(2,2,6,6,7,1,7,7,3,1),
              var4 = c(6,4,3,6,4,1,3,3,4,4))

df$condition = factor(df$condition, levels = c(-1,1),labels = c("Digital","Physical"))

For each variable (var1, var2, etc.) I would like a little table with the count, mean, and standard deviation. This code creates the kind of table that I want:

 group_by(df, df$condition) %>% 
  summarise(
    count = n(),
    mean = mean(var1),
    sd = sd(var1))

But because I have many variables, I would like to use some kind of loop (or "lapply"?) to create all these tables at once. It would also be great if each table could show the name of the variable. Thanks!

CodePudding user response:

You can just use summarise on all the variables, i.e.

library(dplyr)

group_by(df, condition) %>% 
     summarise(across(everything(), ~ c(count = n(), mean = mean(.), sd = sd(.))))

`summarise()` has grouped output by 'condition'. You can override using the `.groups` argument.
# A tibble: 6 x 5
# Groups:   condition [2]
  condition  var1  var2  var3  var4
  <fct>     <dbl> <dbl> <dbl> <dbl>
1 Digital    5     5     5     5   
2 Digital    5.8   2.2   3.8   3   
3 Digital    1.10  1.64  3.03  1.22
4 Physical   5     5     5     5   
5 Physical   5.6   4.6   4.6   4.6 
6 Physical   1.14  1.82  2.41  1.34

You can control the output structure by changing object in the formula, i.e.

group_by(df, condition) %>% 
     summarise(across(everything(), ~ data.frame(count = n(), mean = mean(.), sd = sd(.))))
# A tibble: 2 x 5
  condition var1$count $mean   $sd var2$count $mean   $sd var3$count $mean   $sd var4$count $mean   $sd
  <fct>          <int> <dbl> <dbl>      <int> <dbl> <dbl>      <int> <dbl> <dbl>      <int> <dbl> <dbl>
1 Digital            5   5.8  1.10          5   2.2  1.64          5   3.8  3.03          5   3    1.22
2 Physical           5   5.6  1.14          5   4.6  1.82          5   4.6  2.41          5   4.6  1.34

CodePudding user response:

We could still do it my summarise using a list:

library(dplyr)

df %>% 
  group_by(condition) %>% 
  summarise(across(starts_with("var"), .f = list(n = ~n(),
                                         mean = mean,
                                         sd = sd), na.rm = TRUE))
    
  condition var1_n var1_mean var1_sd var2_n var2_mean var2_sd var3_n var3_mean var3_sd var4_n var4_mean var4_sd
      <dbl>  <int>     <dbl>   <dbl>  <int>     <dbl>   <dbl>  <int>     <dbl>   <dbl>  <int>     <dbl>   <dbl>
1        -1      5       5.8    1.10      5       2.2    1.64      5       3.8    3.03      5       3      1.22
2         1      5       5.6    1.14      5       4.6    1.82      5       4.6    2.41      5       4.6    1.34

CodePudding user response:

df <- data.frame(condition =c(1,1,1,1,1,-1,-1,-1,-1,-1),
                 var1 = c(6,6,4,7,5,6,6,6,4,7),
                 var2 = c(3,4,3,6,7,1,2,1,2,5),
                 var3 = c(2,2,6,6,7,1,7,7,3,1),
                 var4 = c(6,4,3,6,4,1,3,3,4,4))

df$condition = factor(df$condition, levels = c(-1,1),labels = c("Digital","Physical"))





for (var in names(df)[2:length(names(df))]){

  
  tab <- group_by(df, condition) %>% 
    select(c("condition", var)) %>% 
    dplyr::rename(v = var) %>% 
    summarise(
      count = n(),
      mean = mean(v),
      sd = sd(v) 
      )
  print(var)
  print(tab)
}

gives


[1] "var1"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   5.8  1.10
2 Physical      5   5.6  1.14
[1] "var2"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   2.2  1.64
2 Physical      5   4.6  1.82
[1] "var3"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   3.8  3.03
2 Physical      5   4.6  2.41
[1] "var4"
# A tibble: 2 × 4
  condition count  mean    sd
  <fct>     <int> <dbl> <dbl>
1 Digital       5   3    1.22
2 Physical      5   4.6  1.34
> 

  •  Tags:  
  • r
  • Related