I am working with a Qualtrics survey where blocks of questions repeat themselves based on previous questions using a function in the survey build called "loop and merge". I'm trying to pull out like questions and then use rbind so that each question only shows up once in a column. I have a basic example below, however in my actual data, the repeats happen 36 times.
example data frame:
dat <- data.frame("1_q1" = 1:4,
"1_q2" = c("a", "b", "c", "d"),
"2_q1" = 22:25,
"2_q2" = c("i", "j", "k", "l"),
"3_q1" = 90:93,
"3_q2" = c("p", "q", "r", "s"), check.names = FALSE)
note that the "1_" at the start of "1_q1" is the county's reference number
What I could do but that is inefficient, especially since my actual data repeats these questions 36 times:
dat_1 <- dat %>%
select(1:2) %>%
rename(q = 1:2) %>%
mutate("county" = 1)
dat_2 <- dat %>%
select(3:4) %>%
rename(q = 1:2) %>%
mutate("county" = 2)
dat_3 <- dat %>%
select(5:6) %>%
rename(q = 1:2)%>%
mutate("county" = 3)
dat_final <- rbind(dat_1, dat_2, dat_3)
the "dat_final" data frame is what I'd like the data to look like, but also have formatted again here:
dat_clean <- data.frame("q1" = c(1:4, 22:25, 90:93),
"q2" = c("a", "b", "c", "d",
"i", "j", "k", "l",
"p", "q", "r", "s"),
"county" = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3))
I'm horrible with loops but sick of being inefficient with my code.
CodePudding user response:
A solution using dplyr, purrr, stringr
- This solution is not affected by columns orders, number of q
columns. It just use the perfix as base for processing data.
library(dplyr)
library(purrr)
library(stringr)
dat <- data.frame("1_q1" = 1:4,
"1_q2" = c("a", "b", "c", "d"),
"2_q1" = 22:25,
"2_q2" = c("i", "j", "k", "l"),
"3_q1" = 90:93,
"3_q2" = c("p", "q", "r", "s"), check.names = FALSE)
# Here is the indexes of county that want to extract from df
county_index <- c("1", "2", "3")
# Function that take index as input and will extract data from `dat` df
edit_df <- function(index) {
dat %>%
# select column start with index prefix
select(matches(paste0(index, "_"))) %>%
# remove the index prefix from string
rename_all(~ str_replace(., regex("^\\d _", ignore_case = TRUE), "")) %>%
# add county column with the input inex
mutate("county" = as.numeric(index))
}
Result using purrr::map_dfr
# map the county index that want to extract from original df and edit_df function
dat_clean <- map_dfr(.x = county_index, .f = edit_df)
dat_clean
#> q1 q2 county
#> 1 1 a 1
#> 2 2 b 1
#> 3 3 c 1
#> 4 4 d 1
#> 5 22 i 2
#> 6 23 j 2
#> 7 24 k 2
#> 8 25 l 2
#> 9 90 p 3
#> 10 91 q 3
#> 11 92 r 3
#> 12 93 s 3
Created on 2022-05-25 by the reprex package (v2.0.1)
CodePudding user response:
You can do the following, which uses a seq from 1 to ncol(dat)
, by 2:
do.call(
rbind,
lapply(seq(1,ncol(dat),2), \(i) {
dat %>% select(c(i,i 1)) %>% rename_all(~c("q1","q2")) %>% mutate(county=(i 1)/2)
})
)
Output:
q1 q2 county
1 1 a 1
2 2 b 1
3 3 c 1
4 4 d 1
5 22 i 2
6 23 j 2
7 24 k 2
8 25 l 2
9 90 p 3
10 91 q 3
11 92 r 3
12 93 s 3
Another approach, with data.table
library(data.table)
setDT(dat)
rbindlist(lapply(seq(1,ncol(dat),2), \(i) {
setnames(dat[,i:(i 1)],c("q1","q2"))
}), use.names=F,idcol = "county")
Output:
county q1 q2
1: 1 1 a
2: 1 2 b
3: 1 3 c
4: 1 4 d
5: 2 22 i
6: 2 23 j
7: 2 24 k
8: 2 25 l
9: 3 90 p
10: 3 91 q
11: 3 92 r
12: 3 93 s