Home > Enterprise >  Barplot of dummy categorical variables
Barplot of dummy categorical variables

Time:02-22

My data looks like this:

library(tidyverse)
#> Warning: package 'readr' was built under R version 4.1.2

set.seed(123)

dat <- tibble(
  mcq_option_1 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_2 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_3 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_4 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_5 = sample(c(0,1), replace = TRUE, size = 100)
)

Created on 2022-02-21 by the reprex package (v2.0.1)

My goal is to create a barplot of the times each mcq option was selected (counts of 1s in each column). What is the quickest data manipulation technique to achieve this? Should I pivot longer first and then create a new count variable using mutate? Or can I quickly do this inside of the ggplot function?

Is there a shorter/better solution than this?

new <- dat %>% pivot_longer(
  cols = everything(),
  names_to = 'options',
  values_to = 'active'
)

new %>% filter(active == 1) %>% group_by(options) %>% summarise(n = n())

CodePudding user response:

Not sure if you're looking for better execution speed or simply shorter code. You can at least get the code a bit shorter by first summing across columns, then pivoting.

dat %>% 
  summarise(across(.fns = sum)) %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(name, value))  
  geom_col()

CodePudding user response:

You don't need to perform the summarize() after pivoting because by default geom_col() uses stat_identity() which will stack all the values. Therefore you don't need to filter because the 0 values will just drop out.

library(tidyverse)

set.seed(123)

dat <- tibble(
  mcq_option_1 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_2 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_3 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_4 = sample(c(0,1), replace = TRUE, size = 100),
  mcq_option_5 = sample(c(0,1), replace = TRUE, size = 100)
)

dat %>%
  pivot_longer(cols = everything(),
               names_to = 'options',
               values_to = 'active') %>%
  ggplot(aes(options, active))  
  geom_col()

Created on 2022-02-21 by the reprex package (v2.0.1)

  • Related