So I have a df
df <- cbind.data.frame(
ID = c("123", "604", "789", "193", "872"),
r1 = c("HISPANIC", "WHITE", "ASIAN", "BLACK", "ASIAN"),
r2 = c(NA, NA, "WHITE", "HISPANIC", "OTHER"),
r3 = c(NA, NA, NA, "OTHER", "OTHER"))
ID r1 r2 r3
1 123 HISPANIC <NA> <NA>
2 604 WHITE <NA> <NA>
3 789 ASIAN WHITE <NA>
4 193 BLACK HISPANIC OTHER
5 872 ASIAN OTHER OTHER
Id like to create a new column(FINALRACE) that recategorizes and combines r1:r3. Any row that contains HISPANIC remains hispanic, if column r2:r3 are NA then return column r1, and else return other
i've tried:
df$FINALRACE <- ifelse(df == 'HISPANIC', 'HISPANIC',
ifelse(df$r2 == '', as.character(r1), 'OTHER'))
df<- df %>% mutate(FINALRACE = if_else(df == 'HISPANIC', 'HISPANIC',
ifelse(df$r2 == '', as.character(r1),'OTHER')))
ultimately would like df to look like:
ID r1 r2 r3 FINALRACE
1 123 HISPANIC <NA> <NA> HISPANIC
2 604 WHITE <NA> <NA> WHITE
3 789 ASIAN WHITE <NA> OTHER
4 193 BLACK HISPANIC OTHER HISPANIC
5 872 ASIAN OTHER OTHER ASIAN
CodePudding user response:
Does this solve your problem?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- cbind.data.frame(
ID = c("123", "604", "789", "193", "872"),
r1 = c("HISPANIC", "WHITE", "ASIAN", "BLACK", "ASIAN"),
r2 = c(NA, NA, "WHITE", "HISPANIC", "OTHER"),
r3 = c(NA, NA, NA, "OTHER", "OTHER"))
#"Any row that contains HISPANIC remains hispanic,
# if column r2:r3 are NA then return column r1,
# and else return other"
df %>%
mutate(FINALRACE = case_when(
if_any(c(r1, r2, r3), ~.x == "HISPANIC") ~ "HISPANIC",
(is.na(r2) | r2 == "OTHER") & (is.na(r3) | r3 == "OTHER") ~ r1,
TRUE ~ "OTHER"
))
#> ID r1 r2 r3 FINALRACE
#> 1 123 HISPANIC <NA> <NA> HISPANIC
#> 2 604 WHITE <NA> <NA> WHITE
#> 3 789 ASIAN WHITE <NA> OTHER
#> 4 193 BLACK HISPANIC OTHER HISPANIC
#> 5 872 ASIAN OTHER OTHER ASIAN
Created on 2022-10-19 by the reprex package (v2.0.1)
CodePudding user response:
Try this solution:
library(dplyr)
df %>%
mutate(FINALRACE = case_when(
rowSums(.[c("r1", "r2", "r3")]=="HISPANIC", TRUE)>0 ~ "HISPANIC",
r2 %in% c(NA, "OTHER") & r3 %in% c(NA, "OTHER") ~ r1,
TRUE ~ "OTHER"
))
ID r1 r2 r3 FINALRACE
1 123 HISPANIC <NA> <NA> HISPANIC
2 604 WHITE <NA> <NA> WHITE
3 789 ASIAN WHITE <NA> OTHER
4 193 BLACK HISPANIC OTHER HISPANIC
5 872 ASIAN OTHER OTHER ASIAN