Home > OS >  R - How can I combine values across multiple columns in a pairwise matching fashion, possibly using
R - How can I combine values across multiple columns in a pairwise matching fashion, possibly using

Time:06-11

I have a dataset taken from a large survey that consists of ~2k participants (rows) and some 160 variables (columns). The relevant columns include the following:

  • participant IDS
  • cl.exp which is a "yes"/"no" response
  • cl.yes.Q1 up to cl.yes.Q7 which have values if the participant answered "yes" to cl.exp, and NA if they answered "no" to cl.exp
  • cl.no.Q1 up to cl.no.Q7 which have values if the participant answered "no" to cl.exp, and NA if they answered "yes" to cl.exp

The questions for both cl.exp "yes" and "no" are synonymous, with the exception that no.Q6 and no.Q7 are the inverse of yes.Q6 and yes.Q7; i.e., no.Q7 is synonymous with yes.Q6 and no.Q6 is synonymous with yes.Q7.

The first few rows could be as follows:

ID cl.exp cl.yes.Q1 cl.yes.Q2 cl.yes.Q3 cl.yes.Q4 cl.yes.Q5 cl.yes.Q6 cl.yes.Q7 cl.no.Q1 cl.no.Q2 cl.no.Q3 cl.no.Q4 cl.no.Q5 cl.no.Q6 cl.no.Q7
1 No NA NA NA NA NA NA NA 2 6 3 4 3 7 4
2 No NA NA NA NA NA NA NA 3 6 6 6 5 7 3
3 Yes 2 5 6 6 4 2 7 NA NA NA NA NA NA NA
4 Yes 7 1 5 6 7 2 5 NA NA NA NA NA NA NA

You'll notice that participants either have values in cl.yes.Q1-7 or cl.no.Q1-7, but never both. I'd like to do either one of the following: a) move the values from cl.no.Q1-7 into the respective columns of cl.yes.Q1-7, or b) create new columns that combine the appropriate columns from cl.yes and cl.no, i.e., cl.yes.Q1 and cl.no.Q1, cl.yes.Q2 and cl.no.Q2, and so on.

I solve the no.Q6 and no.Q7 reverse issue by using the following code:

df[15:16] <- df[16:15]

I then do the following:

df.yes <- df %>% 
             select(contains("cl.yes"), id, cl.exp)  %>%
             drop_na()

df.no <- df %>% 
             select(contains("cl.no"), id, cl.exp)  %>%
             drop_na()

names(df.yes) <- gsub("cl.yes.", "cl.", names(df.yes))
names(df.no) <- gsub("cl.no.", "cl.", names(df.no))
df.cl <- merge(df.yes, df.no, all = TRUE)

This gives me a new dataframe that has the merged columns. However, I believe there must be a simpler/cleaner/more elegant solution than this, particularly with the ability to keep the data in the original dataframe. I tried some iterations with mutate and coalesce and could never succeed. If anyone has a one or two line code that basically does the same thing that I did here, I would greatly appreciate your insight. Thank you!

CodePudding user response:

How about something like this:

library(tidyverse)
dat <- tibble::tribble(~ID, ~cl.exp, ~cl.yes.Q1, ~cl.yes.Q2, ~cl.yes.Q3, ~cl.yes.Q4, ~cl.yes.Q5, ~cl.yes.Q6, ~cl.yes.Q7, ~cl.no.Q1, ~cl.no.Q2, ~cl.no.Q3, ~cl.no.Q4, ~cl.no.Q5, ~cl.no.Q6, ~cl.no.Q7, 
1, "No",NA,NA,NA,NA,NA,NA,NA,2,6,3,4,3,7,4,
2, "Yes",3,6,6,6,5,7,3, NA,NA,NA,NA,NA,NA,NA)

bind_cols(dat %>% select(c(ID, cl.exp)), 
          coalesce(dat %>% 
                     select(contains("no")) %>% 
                     setNames(gsub("\\.no", "", names(.))), 
                   dat %>% 
                     select(contains("yes"))%>% 
                     setNames(gsub("\\.yes", "", names(.)))))
#> # A tibble: 2 × 9
#>      ID cl.exp cl.Q1 cl.Q2 cl.Q3 cl.Q4 cl.Q5 cl.Q6 cl.Q7
#>   <dbl> <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1 No         2     6     3     4     3     7     4
#> 2     2 Yes        3     6     6     6     5     7     3

Created on 2022-06-10 by the reprex package (v2.0.1)

CodePudding user response:

I think you need this one:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(cols = -c(ID, cl.exp), 
               names_to = c('.value', 'name'), 
               names_pattern = '(.*)(\\d )') %>% 
  mutate(cl.yes.Q = coalesce(cl.yes.Q, cl.no.Q), .keep="unused") %>% 
  pivot_wider(names_from = name, values_from = cl.yes.Q)
     ID cl.exp   `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <int> <chr>  <int> <int> <int> <int> <int> <int> <int>
1     1 No         2     6     3     4     3     7     4
2     2 No         3     6     6     6     5     7     3
3     3 Yes        2     5     6     6     4     2     7
4     4 Yes        7     1     5     6     7     2     5

CodePudding user response:

You should be able to use mutate and coalesce like so (after swapping no.Q6 and no.Q7 as you did above):

library(dplyr)
result <- df %>% mutate(cl.Q1 = coalesce(cl.yes.Q1, cl.no.Q1), 
                        cl.Q2 = coalesce(cl.yes.Q2, cl.no.Q2), 
                        cl.Q3 = coalesce(cl.yes.Q3, cl.no.Q3), 
                        cl.Q4 = coalesce(cl.yes.Q4, cl.no.Q4), 
                        cl.Q5 = coalesce(cl.yes.Q5, cl.no.Q5), 
                        cl.Q6 = coalesce(cl.yes.Q6, cl.no.Q6), 
                        cl.Q7 = coalesce(cl.yes.Q7, cl.no.Q7)) %>%
                 select(-(cl.yes.Q1:cl.no.Q7))

We simply use mutate to create the cl.Q* columns coalescing the values from the cl.yes.Q* and cl.no.Q* columns, respectively. Then, we remove the original cl.yes.Q* and cl.no.Q* columns using select.

  • Related