I have a dataset that has this structure (lots of questions asked to survey respondents) that I want to reshape from wide to long:
library(tidyverse)
df_wide <-
tribble(
~resp_id, ~question_1_info, ~question_1_answer, ~question_2_info, ~question_2_answer,
1, "What is your eye color?", 1, "What is your hair color?", 2,
2, "Are you over 6 ft tall?", 1, "", NA,
3, "What is your hair color?", 0, "Are you under 40?", 1
)
This is the output I would like:
df_long <-
tribble(
~resp_id, ~question_number, ~question_text, ~question_answer,
1, 1, "What is your eye color?", 1,
1, 2, "What is your hair color?", 2,
2, 1, "Are you over 6 ft tall?", 1,
2, 2, "", NA,
3, 1, "What is your hair color?", 0,
3, 2, "Are you under 40?", 1
)
I'm having problems getting the multiple classes of the columns to work together. Here's what I've tried:
df_wide %>%
pivot_longer(
cols = !resp_id,
names_to = c("question_number"),
names_prefix = "question_",
values_to = c("question_text", "question_answer")
)
I can't get the right configuration of names_to
or names_prefix
and values_to
.
CodePudding user response:
We could use names_pattern
after rearranging the substring in column names
library(dplyr)
library(tidyr)
library(stringr)
df_wide %>%
# rename the columns by rearranging the digits at the end
# "_(\\d )(_.*)" - captures the digits (\\d ) after the _
# and the rest of the characters (_.*)
# replace with the backreference (\\2, \\1) of captured groups rearranged
rename_with(~ str_replace(., "_(\\d )(_.*)", "\\2_\\1"), -resp_id) %>%
pivot_longer(cols = -resp_id, names_to = c( ".value", "question_number"),
names_pattern = "(.*)_(\\d $)")
-output
# A tibble: 6 × 4
resp_id question_number question_info question_answer
<dbl> <chr> <chr> <dbl>
1 1 1 "What is your eye color?" 1
2 1 2 "What is your hair color?" 2
3 2 1 "Are you over 6 ft tall?" 1
4 2 2 "" NA
5 3 1 "What is your hair color?" 0
6 3 2 "Are you under 40?" 1