Home > Net >  R Merge duplicate columns that have different values in one dataframe
R Merge duplicate columns that have different values in one dataframe

Time:10-15

I currently have survey data where a set of Likert-type questions appears twice in the dataset and the set of questions a participant answered depends on an initial response to a binary "check" question. My goal is to merge the sets of duplicate questions. The data looks something like this:

Check Q1 Q2 Q3 Q1.1 Q2.1 Q3.1
1 5 5 4
1 2 5 3
2 4 6 3
2 4 2 1

...where Q1.1 is a duplicate of Q1, and so on for Q2 and Q3

And I'd like my final output to look like this:

Check Q1 Q2 Q3
1 5 5 4
1 2 5 3
2 4 6 3
2 4 2 1

I've been testing out a variety of ideas using things like for-loops, sapply, paste, and cbind. I've run into walls with each of them, particularly because I need to somehow match questions (ex. Q1 gets Q1.1's value when check==2) and run this over a set of multiple columns in one dataset.

Any help on this would be greatly appreciated!

CodePudding user response:

If the missing elements are NA, pivot_longer can be used

library(tidyr)
pivot_longer(df1, cols = -Check, names_pattern = "^(Q\\d ).*", 
      names_to = ".value", values_drop_na = TRUE)

-output

# A tibble: 4 × 4
  Check    Q1    Q2    Q3
  <int> <int> <int> <int>
1     1     5     5     4
2     1     2     5     3
3     2     4     6     3
4     2     4     2     1

data

df1 <- structure(list(Check = c(1L, 1L, 2L, 2L), Q1 = c(5L, 2L, NA, 
NA), Q2 = c(5L, 5L, NA, NA), Q3 = c(4L, 3L, NA, NA), Q1.1 = c(NA, 
NA, 4L, 4L), Q2.1 = c(NA, NA, 6L, 2L), Q3.1 = c(NA, NA, 3L, 1L
)), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

In the check column do the numbers represent individuals? Does each individual have 2 rows of data? Or is this example table all for a single individual that will have 4 rows of data?

If all 4 rows are for 1 person I would structure the table like this if its not already.

Subject Check Q1 Q2 Q3 Q1_1 Q2_1 Q3_1
1        1
1        1
1        2
1        2

There are endless ways of doing this. Based on my knowledge I would subset the dataset into 2 tables for each subject, one for check - 1 and one for check = 2, and then just use rbind to stack them on top of each other. That is what poppiytt did when they created an example dataset but they didn't add a column for subject.

data1 <- (data, check == 1, select = c(subject, check, Q1, Q2, Q3))
data2 <- (data, check == 2, select = c(subject, check, Q1_1, Q2_1, Q3_1))
data3 <- rbind(data1, data2)

I'm sure there is a more efficient way to do this but this will work.

  • Related