Home > Software engineering >  Plot the means of multiple columns
Plot the means of multiple columns

Time:10-29

I want to show different barplots for the years and gender with the mean values of the variables Q1 to Q5, which should look like a density.

I have data that looks like this:

data <- data.frame(userid = c(1,1,1,2,2,2,3,3,3),
                  year = c(2013,2014,2015,2013,2014,2015,2013,2014,2015),
                  gender = c(1,1,1,0,0,0,0,0,0),
                  Q1 = c(3,2,3,1,0,1,2,1,0),
                  Q2 = c(4,3,4,2,0,2,1,4,3),
                  Q3 = c(1,2,1,3,5,4,5,4,5),
                  Q4 = c(1,2,1,2,4,3,2,2,1),
                  Q5 = c(1,1,1,2,1,0,0,0,1))

My solution was to filter() for year and gender first and then use summarise(), to get a vector of the means and put this into the barplot() function:

data %>% filter(gender==1,year==2013) %>% select(-userid,-gender,-year) %>% summarise_all(mean) %>%
  as.numeric() %>%
  barplot()

Instead of doing this for every combination of year and gender, is there a more elegant way, using ggplot and facet_wrap()?

CodePudding user response:

I may have misunderstood how you want the plot arranged, but if you want to show the mean score answer per year and gender group, you could do facets like this:

library(tidyverse)

data %>%
  pivot_longer(starts_with("Q")) %>%
  group_by(year, gender, name) %>%
  summarize(value = mean(value)) %>%
  ggplot(aes(name, value))  
  geom_col(fill = 'deepskyblue4')  
  facet_grid(year ~ gender)  
  labs(x = 'Question', y = 'Average score')  
  theme_minimal(base_size = 16)

enter image description here

CodePudding user response:

Maybe you want something like this with facet_wrap and geom_col where the mean is calculate using rowMeans like this:

library(dplyr)
library(ggplot2)
data %>% 
  mutate(mean = rowMeans(select(., starts_with("Q")), na.rm = TRUE)) %>%
  ggplot(aes(x = year, y = mean, fill = factor(gender)))  
  geom_col()  
  labs(x = 'Year', y = 'Mean Q1 to Q5', fill = 'Gender')  
  theme_bw()  
  facet_wrap(~userid)

Created on 2022-10-28 with

CodePudding user response:

aggregate then barplot.

par(mfrow=c(1, 4))
sapply(unique(data$year), \(x) {
  as.matrix(aggregate(cbind(Q1, Q2, Q3, Q4, Q5) ~ gender, data[data$year == x, ], FUN=mean)[-1]) |>
    barplot(beside=TRUE, col=c(2, 4), main=x)
})
plot.new()
legend('left', legend=c('m', 'f'), col=c(2, 4), cex=1.2, pch=15, bty='n')

enter image description here

CodePudding user response:

This approach does not require you to first calculate the mean, that is handled by stat_summary(), specifying fun = mean.

library(tidyverse)

data <- data.frame(userid = c(1,1,1,2,2,2,3,3,3),
                   year = c(2013,2014,2015,2013,2014,2015,2013,2014,2015),
                   gender = c(1,1,1,0,0,0,0,0,0),
                   Q1 = c(3,2,3,1,0,1,2,1,0),
                   Q2 = c(4,3,4,2,0,2,1,4,3),
                   Q3 = c(1,2,1,3,5,4,5,4,5),
                   Q4 = c(1,2,1,2,4,3,2,2,1),
                   Q5 = c(1,1,1,2,1,0,0,0,1))

data %>% 
  select(starts_with("Q")) %>% # Selects each column that starts with "Q"
  pivot_longer(cols = everything()) %>% # Pivot to long format
  ggplot(aes(x = name, y = value, fill = name))  
  stat_summary(geom = "bar", fun = "mean")   # Geom and function can be changed easily
  theme_classic()  
  labs(x = "Q", y = "Mean value", fill = NULL)

Created on 2022-10-28 by the reprex package (v2.0.1)

  • Related