ID<-1:6
math<-c("YES","NO","YES","NO",NA,NA)
history<-c(NA,NA,"NO","NO","YES",NA)
dt<-data.frame(ID,math,history)
ID math history
1 1 YES <NA>
2 2 NO <NA>
3 3 YES NO
4 4 NO NO
5 5 <NA> YES
6 6 <NA> <NA>
I want to make an additional column ("pass") as follows
if a student ever had "yes" at least once: "YES" (No matter if another subject is missing data or not or so.)
if the student did not get "yes"
2-1) If both subjects are missing data : NA
2-2) if one of subjects is "NO" : "NO"
so, the columns would like this: (I can do this manually with this minimal example. but not with my real data)
> dt
ID math history pass
1 1 YES <NA> YES
2 2 NO <NA> NO
3 3 YES NO YES
4 4 NO NO NO
5 5 <NA> YES YES
6 6 <NA> <NA> <NA>
I tried to use
dt$pass <- ifelse(rowSums(dt[,-1]=="YES",na.rm=T)>0,"YES","NO")
this code, but it was tricky
because if I put na.rm=T
they consider NA is "NO"
(ID 6 student will be "NO")
if I put na.rm=F
, an only students that have both subject's data are considered.
In my data, I have really lots of columns, not only math and history..
CodePudding user response:
library(tidyverse)
ID <- 1:6
math <- c("YES", "NO", "YES", "NO", NA, NA)
history <- c(NA, NA, "NO", "NO", "YES", NA)
dt <- data.frame(ID, math, history)
dt |>
rowwise() |>
mutate(pass = case_when(
sum(c_across(-ID) == "YES", na.rm = TRUE) >= 1 ~ "YES",
sum(c_across(-ID) == "NO", na.rm = TRUE) >= 1 ~ "No",
TRUE ~ NA_character_
))
#> # A tibble: 6 × 4
#> # Rowwise:
#> ID math history pass
#> <int> <chr> <chr> <chr>
#> 1 1 YES <NA> YES
#> 2 2 NO <NA> No
#> 3 3 YES NO YES
#> 4 4 NO NO No
#> 5 5 <NA> YES YES
#> 6 6 <NA> <NA> <NA>
Created on 2022-06-10 by the reprex package (v2.0.1)
CodePudding user response:
Seems like the order is what matters:
dt$pass <- NA
dt$pass[dt$math == "NO" | dt$history=="NO"] <- "NO"
dt$pass[dt$math == "YES" | dt$history=="YES"] <- "YES"
> dt
ID math history pass
1 1 YES <NA> YES
2 2 NO <NA> NO
3 3 YES NO YES
4 4 NO NO NO
5 5 <NA> YES YES
6 6 <NA> <NA> <NA>
CodePudding user response:
Try this
library(dplyr)
dt %>% mutate(pass = case_when(math == "YES" | history == "YES" ~ "YES" ,
is.na(math) & is.na(history) ~ NA_character_ ,
TRUE ~ "NO"))
CodePudding user response:
A tidyverse solution that is robust with respect to the number (and names) of subjects:
library(tidyverse)
dt %>%
mutate(
pass=dt %>%
pivot_longer(-ID) %>%
group_by(ID) %>%
summarise(
anyYes=sum(value == "YES", na.rm=T),
anyNo=sum(value == "NO", na.rm=T)
) %>%
mutate(
pass=ifelse(
anyYes >= 1,
"YES",
ifelse(anyNo >= 1, "NO", NA)
)
) %>%
pull(pass)
)
ID math history pass
1 1 YES <NA> YES
2 2 NO <NA> NO
3 3 YES NO YES
4 4 NO NO NO
5 5 <NA> YES YES
6 6 <NA> <NA> <NA>
The key is the pivot
ing to long format.