This question relates to summarising survey results
The case is where there are multiple variants of the same question so
df <- data.frame(
survey_question_subquestionA = c("Less","More","Less","More","Missing","More",Less")
survey_question_subquestionB = c("Less","More","Less","More","Missing","More",Less")
survey_question_subquestionC = c("Less","More","Less","More","Missing","More",Less")
survey_weights = c( 1.02, 1.05, 0.97, 0.92, 0.94, 1.03, 0.98 )
The objective is to end up with a table
More | Less | Missing | |
---|---|---|---|
subquestionA | n | n | n |
subquestionB | n | n | n |
subquestionC | n | n | n |
Total | n | n | n |
qn <- "Question"
wt <- "Survey_Weights"
tvdf |> filter(Question == qn) |> select(Variable) |> unlist() |> unname() -> CVL
In real life I extract the variable names from another dataframe containing meta information called tvdf and the data frame itself has lots of other questions and surveys in it.
This works for unweighted counts
df |> select(any_of(CVL)) |> pivot_longer(cols = contains(qn), names_to = "Variable", values_to = "Values") |> group_by(Variable) |> count(Values) |> pivot_wider(names_from = Values, values_from = n) -> table
the problem is
count(Values, wt = get(wt))
doesnt work because they were excluded at select stage. If I include them they end up in my table.
Any suggestions welcome. I'm open to data.table solutions too.
CodePudding user response:
Is this what you're looking for?
For the addition of the "Total" row, I used the janitor
package which has a handy function adorn_totals()
.
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.1
#> Warning: package 'tibble' was built under R version 4.2.1
#> Warning: package 'tidyr' was built under R version 4.2.1
#> Warning: package 'readr' was built under R version 4.2.1
#> Warning: package 'purrr' was built under R version 4.2.1
#> Warning: package 'dplyr' was built under R version 4.2.1
#> Warning: package 'stringr' was built under R version 4.2.1
#> Warning: package 'forcats' was built under R version 4.2.1
df <- tibble(
survey_question_subquestionA = c("Less",
"More",
"Less",
"More",
"Missing",
"More",
"Less"),
survey_question_subquestionB = c("Less", "More", "Less", "More", "Missing", "More", "Less"),
survey_question_subquestionC = c("Less",
"More",
"Less",
"More",
"Missing",
"More",
"Less"),
survey_weights = c(1.02, 1.05, 0.97, 0.92, 0.94, 1.03, 0.98)
)
df |>
pivot_longer(cols = 1:3,
names_to = "question") |>
count(question, value, wt = survey_weights) |>
pivot_wider(names_from = value,
values_from = n) |>
janitor::adorn_totals()
#> question Less Missing More
#> survey_question_subquestionA 2.97 0.94 3
#> survey_question_subquestionB 2.97 0.94 3
#> survey_question_subquestionC 2.97 0.94 3
#> Total 8.91 2.82 9
Created on 2022-11-04 with reprex v2.0.2