I have a dataframe with a column for test prep course completion and a column for low-income. Both of these are categorical.
I want to graph the count of student from low-income families who completed the course vs. those that did not. Currently my process seems to be too cumbersome.
My process is below
Original Data
|low_income|test| |---|---| |yes|completed| |yes|none| |no|completed| |yes|none| etc...
STEP 1: Create a frequency table
completed | none | |
---|---|---|
no | 3 | 1 |
yes | 5 | 3 |
STEP 2: Manually Create new dataframe * This is the part that I am concerned about
low_income | test | count |
---|---|---|
no | completed | 3 |
no | none | 1 |
yes | completed | 5 |
yes | none | 3 |
then finally graph that
here is my full code:
suppressPackageStartupMessages(library(ggplot2))
# Sample data for dataframe
low_income <- c("yes","yes", "no","yes","yes","yes", "no","yes","yes","yes", "no","no")
test <- c("completed", "none","completed", "none","completed", "completed","completed", "completed", "none","completed", "none","completed")
df <- data.frame(low_income, test)
# STEP 1: Create afrequency table to get the counts
table1 <- table(df$low_income, df$test)
# STEP 2: Use cross tabs to manually create a new dataframe <-- I feel like I'm going wrong here
low_income <- c("no","no", "yes","yes")
test <- c("completed", "none","completed", "none")
count <- c(3, 1, 5,3)
df_2 <- data.frame(low_income, test,count)
# STEP 3: Finally graphing
ggplot(df_2, aes(factor(low_income), count, fill = test))
geom_bar(stat="identity", position = "dodge")
scale_fill_brewer(palette = "Set1")
CodePudding user response:
Here is the suggestion by @Jahi Zamy a little modified:
library(tidyverse)
df %>%
dplyr::count(low_income, test) %>%
ggplot(aes(x = low_income, y = n, fill=test))
geom_col(position = position_dodge())
scale_fill_brewer(palette = "Set1")