I have three columns in a dataframe: age, gender and income.
I want to loop through these columns and create plots based on the data in them.
I know in stata you can loop through variables and then run commands with those variables. However the code below does not seem to work, is there an equivalent way to do what I want to do in R?
groups <- c(df$age, df$gender, df$income)
for (i in groups){
df %>% group_by(i) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(y = prop, x = i))
geom_col()
}
CodePudding user response:
You can use lapply
df <- data.frame(age = sample(c("26-30", "31-35", "36-40", "41-45"), 20, replace = T),
gender = sample(c("M", "F"), 20, replace = T),
income = sample(c("High", "Medium", "Low"), 20, replace = T),
prop = runif(20))
lapply(df[,c(1:3)], function(x) ggplot(data = df, aes(y = df$prop, x = x)) geom_col())
CodePudding user response:
you can also use the tidyverse. Loop through a vector of grouping variable names with map
. On every iteration, you can evaluate !!sym(variable)
the variable name to group_by
. The rest of the code is pretty much the same you used.
library(dplyr)
library(purrr)
groups <- c('age', 'gender', 'income')
map(groups, ~
df %>% group_by(!!sym(.x)) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(y = prop, x = i))
geom_col()
)
If you want to use a for loop:
groups <- c('age', 'gender', 'income')
for (i in groups){
df %>% group_by(!!sym(i))) %>%
summarise(n = n()) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(y = prop, x = i))
geom_col()
}