This question is about prepping data for ggplot
I have a long dataframe of survey questions and option responses (the real one is 76,931 rows) of the form:
df <- data.frame(
Question = c( "Qone", "Qone", "Qone", "Qtwo", "Qtwo", "Qthree", "Qthree"),
Response = as.factor( c( "Opt1", "Opt2","Opt1","Opt1","Opt2","Opt1","Opt2"))
)
and I want to create a summary table of the form
Question | Response | Count | Total_of_Q | Percent_C_of_Total |
---|---|---|---|---|
Qone | Opt1 | 2 | 3 | 66.6 |
Qone | Opt2 | 1 | 3 | 33.3 |
Qtwo | Opt1 | 1 | 2 | 50 |
Qtwo | Opt2 | 1 | 2 | 50 |
Qthree | Opt1 | 1 | 2 | 50 |
Qthree | Opt2 | 1 | 2 | 50 |
using
library(tidyverse)
If I write
df %>%
group_by(Question) %>%
summarise(Total_of_Q = n()) -> totals
totals
Totals is
Question | Total_of_Q |
---|---|
Qone | 3 |
Qtwo | 2 |
Qthree | 2 |
and if I write
df %>%
group_by(Question) %>%
count(Response) -> Count
Count is
Question | Response | Count |
---|---|---|
Qone | Opt1 | 2 |
Qone | Opt2 | 1 |
Qtwo | Opt1 | 1 |
Qtwo | Opt2 | 1 |
Qthree | Opt1 | 1 |
Qthree | Opt2 | 1 |
But at this point I have two data frames that are pretty hard to bind together (because of different row lengths) and I can't help thinking there must be better ways to generate the whole table all at once.
Any hints or suggestions gratefully received.
CodePudding user response:
library(dplyr)
df %>%
# shortcut for group_by(Question, Response) %>% summarize(Count = n())`
count(Question, Response, name = "Count") %>% # 1: get detailed counts
group_by(Question) %>%
mutate(total = sum(Count), # 2: get counts by queston
pct = Count / total) %>% # 1 divided by 2
ungroup()