R dplyr long dataframe factor level summarisation to provide counts, group totals and percentages in-CodePudding

This question is about prepping data for ggplot

I have a long dataframe of survey questions and option responses (the real one is 76,931 rows) of the form:

df <- data.frame( 
Question = c( "Qone", "Qone", "Qone", "Qtwo", "Qtwo", "Qthree", "Qthree"),
Response = as.factor( c( "Opt1", "Opt2","Opt1","Opt1","Opt2","Opt1","Opt2"))
)

and I want to create a summary table of the form

Question	Response	Count	Total_of_Q	Percent_C_of_Total
Qone	Opt1	2	3	66.6
Qone	Opt2	1	3	33.3
Qtwo	Opt1	1	2	50
Qtwo	Opt2	1	2	50
Qthree	Opt1	1	2	50
Qthree	Opt2	1	2	50

using

library(tidyverse)

If I write

df %>%
   group_by(Question) %>%
   summarise(Total_of_Q = n()) -> totals
totals

Totals is

Question	Total_of_Q
Qone	3
Qtwo	2
Qthree	2

and if I write

df %>% 
    group_by(Question) %>%
    count(Response) -> Count

Count is

Question	Response	Count
Qone	Opt1	2
Qone	Opt2	1
Qtwo	Opt1	1
Qtwo	Opt2	1
Qthree	Opt1	1
Qthree	Opt2	1

But at this point I have two data frames that are pretty hard to bind together (because of different row lengths) and I can't help thinking there must be better ways to generate the whole table all at once.

Any hints or suggestions gratefully received.

CodePudding user response：

library(dplyr)
df %>%
  # shortcut for group_by(Question, Response) %>% summarize(Count = n())` 
  count(Question, Response, name = "Count") %>%   # 1: get detailed counts
  group_by(Question) %>% 
  mutate(total = sum(Count),                      # 2: get counts by queston
         pct = Count / total) %>%                 # 1 divided by 2
  ungroup()