Home > other >  R dplyr long dataframe factor level summarisation to provide counts, group totals and percentages in
R dplyr long dataframe factor level summarisation to provide counts, group totals and percentages in

Time:09-01

This question is about prepping data for ggplot

I have a long dataframe of survey questions and option responses (the real one is 76,931 rows) of the form:

df <- data.frame( 
Question = c( "Qone", "Qone", "Qone", "Qtwo", "Qtwo", "Qthree", "Qthree"),
Response = as.factor( c( "Opt1", "Opt2","Opt1","Opt1","Opt2","Opt1","Opt2"))
)

and I want to create a summary table of the form

Question Response Count Total_of_Q Percent_C_of_Total
Qone Opt1 2 3 66.6
Qone Opt2 1 3 33.3
Qtwo Opt1 1 2 50
Qtwo Opt2 1 2 50
Qthree Opt1 1 2 50
Qthree Opt2 1 2 50

using

library(tidyverse)

If I write

df %>%
   group_by(Question) %>%
   summarise(Total_of_Q = n()) -> totals
totals

Totals is

Question Total_of_Q
Qone 3
Qtwo 2
Qthree 2

and if I write

df %>% 
    group_by(Question) %>%
    count(Response) -> Count

Count is

Question Response Count
Qone Opt1 2
Qone Opt2 1
Qtwo Opt1 1
Qtwo Opt2 1
Qthree Opt1 1
Qthree Opt2 1

But at this point I have two data frames that are pretty hard to bind together (because of different row lengths) and I can't help thinking there must be better ways to generate the whole table all at once.

Any hints or suggestions gratefully received.

CodePudding user response:

library(dplyr)
df %>%
  # shortcut for group_by(Question, Response) %>% summarize(Count = n())` 
  count(Question, Response, name = "Count") %>%   # 1: get detailed counts
  group_by(Question) %>% 
  mutate(total = sum(Count),                      # 2: get counts by queston
         pct = Count / total) %>%                 # 1 divided by 2
  ungroup()
  • Related