I have a wide df unique at survey respondent level, and with corresponding columns for each question in the survey and those questions' weights. For example, I have something like this:
x <- tibble::tribble(
~uniqueID, ~type, ~name_of_question_bin, ~name_of_question_weight, ~name_of_second_question_bin, ~name_of_second_question_weight,
12345L, "AA", 1L, 0.3, 0L, 0.5,
67891L, "AA", 0L, 0.7, 1L, 0.9,
23456L, "BB", 1L, 0.95, 1L, 0.6,
78910L, "BB", 0L, 0.98, 1L, 0.1
)
I would like to pivot_longer
this data so I have one column for questions
and another for weights
. For example, the output i want would be something like this:
tibble::tribble(
~type, ~questions, ~scores, ~weights,
"AA", "name_of_question_bin", 1L, 0.3,
"AA", "name_of_question_bin", 0L, 0.7,
"BB", "name_of_question_bin", 1L, 0.95,
"BB", "name_of_question_bin", 0L, 0.98,
"AA", "name_of_second_question_bin", 0L, 0.5,
"AA", "name_of_second_question_bin", 1L, 0.9,
"BB", "name_of_second_question_bin", 1L, 0.6,
"BB", "name_of_second_question_bin", 1L, 0.1
)
I've tried something like this:
x %>%
dplyr::select(ends_with("bin"), type) %>%
pivot_longer(!type, names_to = "question", values_to = "scores")
which gets me what I want just for the question values, but I want a second column for the weights. I thought about repeating pivot_wider, once for scores and another for weights, and then joining them together; but a) i'm not sure if the output would be in the same order, and b) it strikes me there's probably a way to do this all in one pivot_wider call (or at least without joining).
I'd appreciate any pointers from folks! Thank you
CodePudding user response:
Something like this? The column names/values of questions
are slightly different than your expected output, but adjusting would be easy if needed. With the way your data is structured, the intended approach doesn't quite result in the structure you're expecting.
library(dplyr, warn.conflicts = FALSE)
library(tidyr)
x <- tibble::tribble(
~uniqueID, ~type, ~name_of_question_bin, ~name_of_question_weight, ~name_of_second_question_bin, ~name_of_second_question_weight,
12345L, "AA", 1L, 0.3, 0L, 0.5,
67891L, "AA", 0L, 0.7, 1L, 0.9,
23456L, "BB", 1L, 0.95, 1L, 0.6,
78910L, "BB", 0L, 0.98, 1L, 0.1
)
x |>
select(-uniqueID) |>
pivot_longer(
starts_with("name_of"),
names_pattern = c("(name_of_question|name_of_second_question)_(weight|bin)"),
names_to = c("questions", ".value")
) |>
arrange(questions)
#> # A tibble: 8 × 4
#> type questions bin weight
#> <chr> <chr> <int> <dbl>
#> 1 AA name_of_question 1 0.3
#> 2 AA name_of_question 0 0.7
#> 3 BB name_of_question 1 0.95
#> 4 BB name_of_question 0 0.98
#> 5 AA name_of_second_question 0 0.5
#> 6 AA name_of_second_question 1 0.9
#> 7 BB name_of_second_question 1 0.6
#> 8 BB name_of_second_question 1 0.1
Created on 2022-10-18 with reprex v2.0.2
If you're wondering what's going on here, from the pivot_longer()
docs for names_to
:
".value"
indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overridingvalues_to
entirely.
CodePudding user response:
I think this does what you want:
x %>%
pivot_longer(ends_with("bin"),
names_to = "question",
values_to = "scores") %>%
pivot_longer(ends_with("weight"),
names_to = "quest_weight",
values_to = "weights") %>%
mutate(question_case = str_detect(question, "second"),
weight_case = str_detect(quest_weight, "second"),
case_match = question_case == weight_case) %>%
filter(case_match) %>%
select(-contains("case"), -quest_weight)
The mutate is probably overly verbose, but basically sets up the easy filter line to ensure that we only keep rows where both question and weight correspond to name_of_question
or name_of_second_question
.