I have a data frame like this:
ID diagnosis A1 A2 A3
a yes A A B
b yes B C D
c no <NA> C <NA>
d no E C D
e yes D <NA> B
Here A1, A2, and A3 refer to the questions in my test and the letters below represent the answer that participants gave. What I want is to create new columns per question indicating whether the answers are true or not. if it is true I give 1 and if it is not 0. For some questions, I have two right answers. So this is the code I used from dplyr and what I got:
mydf <- mydf %>% mutate(A1.1 = if_else(A1 %in% c("A"), 1, 0))%>% mutate(A2.1 = if_else(A2 %in% c("A", "B"), 1, 0)) %>% mutate(A3.1 = if_else(A3 %in% c("A", "B"), 1, 0))
ID diagnosis A1 A2 A3 A1.1 A2.1 A3.1
a yes A A B 1 1 1
b yes B C D 0 0 0
c no <NA> C <NA> 0 0 0
d no E C D 0 0 0
e yes D <NA> B 0 0 1
As you can see NA values turned to 0 but I want to keep them as NAs. So, my first question is how can I keep the NAs.
And my second question is whether can you think of any shorter way to make those columns based on answers given to the other columns. Because in my real data I have 30 questions :)
Thank you so much!
CodePudding user response:
%in%
returns FALSE
where there are NAs
. We could use ==
library(dplyr)
mydf %>%
mutate(A1.1 = (A1 == "A"), A2.1 = (A2 == "A"|A2 == "B"),
A3.1 = (A3 == "A"|A3 == "B") )
-output
ID diagnosis A1 A2 A3 A1.1 A2.1 A3.1
1 a yes A A B 1 1 1
2 b yes B C D 0 0 0
3 c no <NA> C <NA> NA 0 NA
4 d no E C D 0 0 0
5 e yes D <NA> B 0 NA 1
If there are more than one column that uses the same comparison, then use across
to loop
mydf %>%
mutate(A1.1 = (A1 == "A"), across(A2:A3,
~ (.x == "A"|.x == "B"), .names = "{.col}.1"))
ID diagnosis A1 A2 A3 A1.1 A2.1 A3.1
1 a yes A A B 1 1 1
2 b yes B C D 0 0 0
3 c no <NA> C <NA> NA 0 NA
4 d no E C D 0 0 0
5 e yes D <NA> B 0 NA 1
data
mydf <- structure(list(ID = c("a", "b", "c", "d", "e"), diagnosis = c("yes",
"yes", "no", "no", "yes"), A1 = c("A", "B", NA, "E", "D"), A2 = c("A",
"C", "C", "C", NA), A3 = c("B", "D", NA, "D", "B")),
class = "data.frame", row.names = c(NA,
-5L))
CodePudding user response:
I would do it column-wise:
mydf$A1.1 = ifelse(mydf$A1 == "A", 1, 0)
mydf$A2.1 = ifelse(mydf$A2 == "A" | mydf$A2 == "B", 1, 0)
This will preserve NA's if the response is NA. ...
CodePudding user response:
I would store correct answers in a list and then do it this way:
# list of correct answers
ans <- list(A1='A',
A2=c('A', 'B'),
A3=c('A', 'B'))
# check the answers
tmp <- sapply(names(ans), function(a)
as.numeric(ifelse(is.na(mydf[[a]]), NA, mydf[[a]] %in% ans[[a]])))
# change column names
colnames(tmp) <- paste0(colnames(tmp), '.1')
cbind(mydf, tmp)