I currently have survey data where a set of Likert-type questions appears twice in the dataset and the set of questions a participant answered depends on an initial response to a binary "check" question. My goal is to merge the sets of duplicate questions. The data looks something like this:
Check | Q1 | Q2 | Q3 | Q1.1 | Q2.1 | Q3.1 |
---|---|---|---|---|---|---|
1 | 5 | 5 | 4 | |||
1 | 2 | 5 | 3 | |||
2 | 4 | 6 | 3 | |||
2 | 4 | 2 | 1 |
...where Q1.1 is a duplicate of Q1, and so on for Q2 and Q3
And I'd like my final output to look like this:
Check | Q1 | Q2 | Q3 |
---|---|---|---|
1 | 5 | 5 | 4 |
1 | 2 | 5 | 3 |
2 | 4 | 6 | 3 |
2 | 4 | 2 | 1 |
I've been testing out a variety of ideas using things like for-loops, sapply, paste, and cbind. I've run into walls with each of them, particularly because I need to somehow match questions (ex. Q1 gets Q1.1's value when check==2) and run this over a set of multiple columns in one dataset.
Any help on this would be greatly appreciated!
CodePudding user response:
If the missing elements are NA
, pivot_longer
can be used
library(tidyr)
pivot_longer(df1, cols = -Check, names_pattern = "^(Q\\d ).*",
names_to = ".value", values_drop_na = TRUE)
-output
# A tibble: 4 × 4
Check Q1 Q2 Q3
<int> <int> <int> <int>
1 1 5 5 4
2 1 2 5 3
3 2 4 6 3
4 2 4 2 1
data
df1 <- structure(list(Check = c(1L, 1L, 2L, 2L), Q1 = c(5L, 2L, NA,
NA), Q2 = c(5L, 5L, NA, NA), Q3 = c(4L, 3L, NA, NA), Q1.1 = c(NA,
NA, 4L, 4L), Q2.1 = c(NA, NA, 6L, 2L), Q3.1 = c(NA, NA, 3L, 1L
)), class = "data.frame", row.names = c(NA, -4L))
CodePudding user response:
In the check column do the numbers represent individuals? Does each individual have 2 rows of data? Or is this example table all for a single individual that will have 4 rows of data?
If all 4 rows are for 1 person I would structure the table like this if its not already.
Subject Check Q1 Q2 Q3 Q1_1 Q2_1 Q3_1
1 1
1 1
1 2
1 2
There are endless ways of doing this. Based on my knowledge I would subset the dataset into 2 tables for each subject, one for check - 1 and one for check = 2, and then just use rbind to stack them on top of each other. That is what poppiytt did when they created an example dataset but they didn't add a column for subject.
data1 <- (data, check == 1, select = c(subject, check, Q1, Q2, Q3))
data2 <- (data, check == 2, select = c(subject, check, Q1_1, Q2_1, Q3_1))
data3 <- rbind(data1, data2)
I'm sure there is a more efficient way to do this but this will work.