I am analyzing a very large survey in which I want to combine four parts of the survey, through several combinations of 4 questions. Below I have created a small example. A little background: a respondent either answered q2, q5, q8 or q9, because they only filled in 1 of 4 parts of the survey based on their answer in q1 (not shown here).Therefore, only one of the four columns contains an answer (1 or 2), while the others contain NAs. q2, q5, q8, q9 are similar questions that have the same answer options, which is why I want to combine them to make my dataset less wide and make it easier to further analyze the data.
q2_1 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_1 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_1 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_1 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)
q2_2 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_2 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_2 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_2 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)
df <- data.frame(q2_1, q5_1, q8_1, q9_1, q2_2, q5_2, q8_2, q9_2)
df
# running df shows:
q2_1 q5_1 q8_1 q9_1 q2_2 q5_2 q8_2 q9_2
1 NA NA NA 1 NA NA NA 1
2 NA NA NA 2 NA NA NA 2
3 NA NA 1 NA NA NA 1 NA
4 NA NA 2 NA NA NA 2 NA
5 NA 1 NA NA NA 1 NA NA
6 NA 2 NA NA NA 2 NA NA
7 1 NA NA NA 1 NA NA NA
8 2 NA NA NA 2 NA NA NA
My desired end result would be a dataframe with only columns for questions starting with q2_ (so, in the example that would be q2_1 and q2_2; in reality there's about 20 for this question), but with the NAs replaced for the answer options from the corresponding q5_, q8_, and q_9.
# desired end result
q2_1 q2_2
1 1 1
2 1 2
3 1 1
4 2 2
5 1 1
6 2 2
7 1 1
8 2 2
For single questions, i've done this using the code below, but this is very manual and because q2, q5, q8, and q9 both go up to _20, I'm looking for a way to automate this more.
# example single question
library(tidyverse)
df <- df %>%
mutate(q2_1 = case_when(!is.na(q2_1) ~ q2_1,
!is.na(q5_1) ~ q5_1,
!is.na(q8_1) ~ q8_1,
!is.na(q9_1) ~ q9_1))
I hope I explained myself well enough and looking forward for some directions!
CodePudding user response:
Here's one way, using coalesce
:
df %>%
mutate(q2_1 = do.call(coalesce, across(ends_with('_1'))),
q2_2 = do.call(coalesce, across(ends_with('_2')))) %>%
select(q2_1, q2_2)
#> q2_1 q2_2
#> 1 1 1
#> 2 2 2
#> 3 1 1
#> 4 2 2
#> 5 1 1
#> 6 2 2
#> 7 1 1
#> 8 2 2
CodePudding user response:
q2_1 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_1 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_1 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_1 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)
q2_2 <- c(NA, NA, NA, NA, NA, NA, rep(c(1:2), 1))
q5_2 <- c(NA, NA, NA, NA, rep(c(1:2), 1), NA, NA)
q8_2 <- c(NA, NA, rep(c(1:2), 1), NA, NA, NA, NA)
q9_2 <- c(rep(c(1:2), 1), NA, NA, NA, NA, NA, NA)
df <- data.frame(q2_1, q5_1, q8_1, q9_1, q2_2, q5_2, q8_2, q9_2)
df
#> q2_1 q5_1 q8_1 q9_1 q2_2 q5_2 q8_2 q9_2
#> 1 NA NA NA 1 NA NA NA 1
#> 2 NA NA NA 2 NA NA NA 2
#> 3 NA NA 1 NA NA NA 1 NA
#> 4 NA NA 2 NA NA NA 2 NA
#> 5 NA 1 NA NA NA 1 NA NA
#> 6 NA 2 NA NA NA 2 NA NA
#> 7 1 NA NA NA 1 NA NA NA
#> 8 2 NA NA NA 2 NA NA NA
library(tidyverse)
suffix <- str_c("_", 1:2)
map_dfc(.x = suffix,
.f = ~ transmute(df, !!str_c("q2", .x) := rowSums(across(ends_with(.x
)), na.rm = T)))
#> q2_1 q2_2
#> 1 1 1
#> 2 2 2
#> 3 1 1
#> 4 2 2
#> 5 1 1
#> 6 2 2
#> 7 1 1
#> 8 2 2
Created on 2022-04-04 by the reprex package (v2.0.1)