I'm trying to write some code to iterate through a list of categorical variables and run some summary statistics on each, but am having trouble getting the variable to be recognized in the for loop. It's trying to group by the dummy name of the for loop ('var', in following example), rather than what it's referencing. Simple example below.
library(dplyr)
cat_vars <- c('hair_color', 'skin_color', 'eye_color')
for (var in cat_vars){
starwars %>%
group_by(var) %>%
summarise(n())
}
Thanks for the help!
CodePudding user response:
To refer to variable names stored as character, use the .data
pronoun, like this: .data[[var]]
.
As written, your for
loop won't modify your data or produce any output. What do you want it to do? To print results of each loop, add print()
:
for (var in cat_vars){
starwars %>%
group_by(.data[[var]]) %>%
summarise(n()) %>%
print()
}
#> # A tibble: 13 x 2
#> hair_color `n()`
#> <chr> <int>
#> 1 auburn 1
#> 2 auburn, grey 1
#> 3 auburn, white 1
#> 4 black 13
#> 5 blond 3
#> 6 blonde 1
#> 7 brown 18
#> 8 brown, grey 1
#> 9 grey 1
#> 10 none 37
#> 11 unknown 1
#> 12 white 4
#> 13 <NA> 5
#>
#> # A tibble: 31 x 2
#> skin_color `n()`
#> <chr> <int>
#> 1 blue 2
#> 2 blue, grey 2
#> 3 brown 4
#> 4 brown mottle 1
#> 5 brown, white 1
#> 6 dark 6
#> 7 fair 17
#> 8 fair, green, yellow 1
#> 9 gold 1
#> 10 green 6
#> # ... with 21 more rows
#>
#> # A tibble: 15 x 2
#> eye_color `n()`
#> <chr> <int>
#> 1 black 10
#> 2 blue 19
#> 3 blue-gray 1
#> 4 brown 21
#> 5 dark 1
#> 6 gold 1
#> 7 green, yellow 1
#> 8 hazel 3
#> 9 orange 8
#> 10 pink 1
#> 11 red 5
#> 12 red, blue 1
#> 13 unknown 3
#> 14 white 1
#> 15 yellow 11
Created on 2022-03-17 by the reprex package (v2.0.1)