Home > Blockchain >  add a new column based on conditions on multiple columns in R
add a new column based on conditions on multiple columns in R

Time:11-23

I have a data frame like this:

Q1 <- c("A",NA,"A",NA,NA,"C","D","A","B", NA)#the right answer is A
Q2 <- c("D",NA,"D","C",NA,NA,"A","A","A","A")#the right answer is D
Q3 <- c("B","B","C","A",NA,"A","B","D","E",NA)#the right answer is B
Q4 <- c("B",NA,"C","C","C","C","D","B",NA,"A")#the right answer is C
mydf <- data.frame(Q1,Q2,Q3,Q4)
mydf

These are the questions in my test and what I want to do is create a new column named "pass" and give a pass to the participants as long as they answered at least 1 question correctly.

I know how to do it based on only 1 column like this:

mydf_new <- mydf %>% 
  mutate(Pass = if_else(Q1 %in% c("A"),"yes","no"))
mydf_new
     Q1   Q2   Q3   Q4 Pass
1     A    D    B    B  yes
2  <NA> <NA>    B <NA>   no
3     A    D    C    C  yes
4  <NA>    C    A    C   no
5  <NA> <NA> <NA>    C   no
6     C <NA>    A    C   no
7     D    A    B    D   no
8     A    A    D    B  yes
9     B    A    E <NA>   no
10 <NA>    A <NA>    A   no

But I couldn't figure out how to include multiple columns in the code.

Thanks a lot!

CodePudding user response:

You could do it like this:

library(dplyr)

mydf <- mydf %>%
  mutate(Pass = case_when(
    Q1 == "A" | Q2 == "D" | Q3 == "B" | Q4 == "C" ~ "Yes", 
    TRUE ~ "No"
  ))

mydf
     Q1   Q2   Q3   Q4 Pass
1     A    D    B    B  Yes
2  <NA> <NA>    B <NA>  Yes
3     A    D    C    C  Yes
4  <NA>    C    A    C  Yes
5  <NA> <NA> <NA>    C  Yes
6     C <NA>    A    C  Yes
7     D    A    B    D  Yes
8     A    A    D    B  Yes
9     B    A    E <NA>   No
10 <NA>    A <NA>    A   No

CodePudding user response:

This will do:

answers <- c('A', 'D', 'B', 'C')
mydf$Pass <- ifelse(colSums(t(mydf)==answers, na.rm=T), 'yes', 'no')

mydf
#      Q1   Q2   Q3   Q4 Pass
# 1     A    D    B    B  yes
# 2  <NA> <NA>    B <NA>  yes
# 3     A    D    C    C  yes
# 4  <NA>    C    A    C  yes
# 5  <NA> <NA> <NA>    C  yes
# 6     C <NA>    A    C  yes
# 7     D    A    B    D  yes
# 8     A    A    D    B  yes
# 9     B    A    E <NA>   no
# 10 <NA>    A <NA>    A   no

CodePudding user response:

If you wanted to, you can create an answer key (vector). You can calculate the scores for each row, and then convert to a vector for a Pass column.

answer_key <- c("A", "D", "B", "C")

mydf$Score <- rowSums(mydf == answer_key[col(mydf)], na.rm = T)
mydf$Pass <- factor(mydf$Score > 0, labels = c("No", "Yes"))
mydf

Output

     Q1   Q2   Q3   Q4 Score Pass
1     A    D    B    B     3  Yes
2  <NA> <NA>    B <NA>     1  Yes
3     A    D    C    C     3  Yes
4  <NA>    C    A    C     1  Yes
5  <NA> <NA> <NA>    C     1  Yes
6     C <NA>    A    C     1  Yes
7     D    A    B    D     1  Yes
8     A    A    D    B     1  Yes
9     B    A    E <NA>     0   No
10 <NA>    A <NA>    A     0   No
  • Related