keep the NA's in if_else-CodePudding

I have a data frame like this:

ID diagnosis   A1   A2   A3
a       yes    A    A    B
b       yes    B    C    D
c        no <NA>    C <NA>
d        no    E    C    D
e       yes    D <NA>    B

Here A1, A2, and A3 refer to the questions in my test and the letters below represent the answer that participants gave. What I want is to create new columns per question indicating whether the answers are true or not. if it is true I give 1 and if it is not 0. For some questions, I have two right answers. So this is the code I used from dplyr and what I got:

mydf <- mydf %>% mutate(A1.1 = if_else(A1 %in% c("A"), 1, 0))%>% mutate(A2.1 = if_else(A2 %in% c("A", "B"), 1, 0)) %>% mutate(A3.1 = if_else(A3 %in% c("A", "B"), 1, 0))

ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
a       yes    A    A    B    1    1    1
b       yes    B    C    D    0    0    0
c        no <NA>    C <NA>    0    0    0
d        no    E    C    D    0    0    0
e       yes    D <NA>    B    0    0    1

As you can see NA values turned to 0 but I want to keep them as NAs. So, my first question is how can I keep the NAs.

And my second question is whether can you think of any shorter way to make those columns based on answers given to the other columns. Because in my real data I have 30 questions :)

Thank you so much!

CodePudding user response：

%in% returns FALSE where there are NAs. We could use ==

library(dplyr)
mydf %>%
   mutate(A1.1 =  (A1 == "A"), A2.1 =  (A2 == "A"|A2 == "B"),
     A3.1 =  (A3 == "A"|A3 == "B") )

-output

 ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
1  a       yes    A    A    B    1    1    1
2  b       yes    B    C    D    0    0    0
3  c        no <NA>    C <NA>   NA    0   NA
4  d        no    E    C    D    0    0    0
5  e       yes    D <NA>    B    0   NA    1

If there are more than one column that uses the same comparison, then use across to loop

 mydf %>%
  mutate(A1.1 =  (A1 == "A"), across(A2:A3,
      ~  (.x == "A"|.x == "B"), .names = "{.col}.1"))
  ID diagnosis   A1   A2   A3 A1.1 A2.1 A3.1
1  a       yes    A    A    B    1    1    1
2  b       yes    B    C    D    0    0    0
3  c        no <NA>    C <NA>   NA    0   NA
4  d        no    E    C    D    0    0    0
5  e       yes    D <NA>    B    0   NA    1

data

mydf <- structure(list(ID = c("a", "b", "c", "d", "e"), diagnosis = c("yes", 
"yes", "no", "no", "yes"), A1 = c("A", "B", NA, "E", "D"), A2 = c("A", 
"C", "C", "C", NA), A3 = c("B", "D", NA, "D", "B")),
 class = "data.frame", row.names = c(NA, 
-5L))

CodePudding user response：

I would do it column-wise:

mydf$A1.1 = ifelse(mydf$A1 == "A", 1, 0)
mydf$A2.1 = ifelse(mydf$A2 == "A" | mydf$A2 == "B", 1, 0)

This will preserve NA's if the response is NA. ...

CodePudding user response：

I would store correct answers in a list and then do it this way:

# list of correct answers
ans <- list(A1='A', 
            A2=c('A', 'B'), 
            A3=c('A', 'B'))
# check the answers
tmp <- sapply(names(ans), function(a) 
 as.numeric(ifelse(is.na(mydf[[a]]), NA, mydf[[a]] %in% ans[[a]])))
# change column names
colnames(tmp) <- paste0(colnames(tmp), '.1')
cbind(mydf, tmp)