Home > front end >  Graph 2 categorical variables using value counts
Graph 2 categorical variables using value counts

Time:08-09

I have a dataframe with a column for test prep course completion and a column for low-income. Both of these are categorical.

I want to graph the count of student from low-income families who completed the course vs. those that did not. Currently my process seems to be too cumbersome.

My process is below

Original Data

|low_income|test| |---|---| |yes|completed| |yes|none| |no|completed| |yes|none| etc...

STEP 1: Create a frequency table

completed none
no 3 1
yes 5 3

STEP 2: Manually Create new dataframe * This is the part that I am concerned about

low_income test count
no completed 3
no none 1
yes completed 5
yes none 3

then finally graph that

here is my full code:

suppressPackageStartupMessages(library(ggplot2))

# Sample data for dataframe
low_income <- c("yes","yes", "no","yes","yes","yes", "no","yes","yes","yes", "no","no")
test <- c("completed", "none","completed", "none","completed", "completed","completed", "completed", "none","completed", "none","completed")

df <- data.frame(low_income, test)

# STEP 1: Create afrequency table to get the counts 
table1 <- table(df$low_income, df$test)

# STEP 2: Use cross tabs to manually create a new dataframe <-- I feel like I'm going wrong here
low_income <- c("no","no", "yes","yes")
test <- c("completed", "none","completed", "none")
count <- c(3, 1, 5,3)

df_2 <- data.frame(low_income, test,count)

# STEP 3: Finally graphing
ggplot(df_2, aes(factor(low_income), count, fill = test))   
  geom_bar(stat="identity", position = "dodge")   
  scale_fill_brewer(palette = "Set1")

CodePudding user response:

Here is the suggestion by @Jahi Zamy a little modified:

library(tidyverse)

df %>% 
  dplyr::count(low_income, test) %>% 
  ggplot(aes(x = low_income, y = n, fill=test))  
  geom_col(position = position_dodge())  
  scale_fill_brewer(palette = "Set1")

enter image description here

  • Related