Home > OS >  Compare across columns return value in new column
Compare across columns return value in new column

Time:10-19

So I have a df

df <- cbind.data.frame(
      ID = c("123", "604", "789", "193", "872"),
      r1 = c("HISPANIC", "WHITE", "ASIAN", "BLACK", "ASIAN"),
      r2 = c(NA, NA, "WHITE", "HISPANIC", "OTHER"),
      r3 = c(NA, NA, NA, "OTHER", "OTHER"))
  ID       r1       r2    r3
1 123 HISPANIC     <NA>  <NA>
2 604    WHITE     <NA>  <NA>
3 789    ASIAN    WHITE  <NA>
4 193    BLACK HISPANIC OTHER
5 872    ASIAN    OTHER OTHER

Id like to create a new column(FINALRACE) that recategorizes and combines r1:r3. Any row that contains HISPANIC remains hispanic, if column r2:r3 are NA then return column r1, and else return other

i've tried:

df$FINALRACE <- ifelse(df == 'HISPANIC', 'HISPANIC',
                         ifelse(df$r2 == '', as.character(r1), 'OTHER'))
    
df<- df %>% mutate(FINALRACE = if_else(df == 'HISPANIC', 'HISPANIC',
                                                 ifelse(df$r2 == '', as.character(r1),'OTHER')))

ultimately would like df to look like:

  ID       r1       r2    r3    FINALRACE
1 123 HISPANIC     <NA>  <NA>   HISPANIC
2 604    WHITE     <NA>  <NA>   WHITE
3 789    ASIAN    WHITE  <NA>   OTHER
4 193    BLACK HISPANIC OTHER   HISPANIC
5 872    ASIAN    OTHER OTHER   ASIAN

CodePudding user response:

Does this solve your problem?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- cbind.data.frame(
  ID = c("123", "604", "789", "193", "872"),
  r1 = c("HISPANIC", "WHITE", "ASIAN", "BLACK", "ASIAN"),
  r2 = c(NA, NA, "WHITE", "HISPANIC", "OTHER"),
  r3 = c(NA, NA, NA, "OTHER", "OTHER"))

#"Any row that contains HISPANIC remains hispanic, 
# if column r2:r3 are NA then return column r1, 
# and else return other"

df %>%
  mutate(FINALRACE = case_when(
    if_any(c(r1, r2, r3), ~.x == "HISPANIC") ~ "HISPANIC",
    (is.na(r2) | r2 == "OTHER") & (is.na(r3) | r3 == "OTHER") ~ r1,
    TRUE ~ "OTHER"
    ))
#>    ID       r1       r2    r3 FINALRACE
#> 1 123 HISPANIC     <NA>  <NA>  HISPANIC
#> 2 604    WHITE     <NA>  <NA>     WHITE
#> 3 789    ASIAN    WHITE  <NA>     OTHER
#> 4 193    BLACK HISPANIC OTHER  HISPANIC
#> 5 872    ASIAN    OTHER OTHER     ASIAN

Created on 2022-10-19 by the reprex package (v2.0.1)

CodePudding user response:

Try this solution:

library(dplyr)

df %>%
  mutate(FINALRACE = case_when(
    rowSums(.[c("r1", "r2", "r3")]=="HISPANIC", TRUE)>0 ~ "HISPANIC",
    r2 %in% c(NA, "OTHER") & r3 %in% c(NA, "OTHER") ~ r1,
    TRUE ~ "OTHER"
  ))

   ID       r1       r2    r3 FINALRACE
1 123 HISPANIC     <NA>  <NA>  HISPANIC
2 604    WHITE     <NA>  <NA>     WHITE
3 789    ASIAN    WHITE  <NA>     OTHER
4 193    BLACK HISPANIC OTHER  HISPANIC
5 872    ASIAN    OTHER OTHER     ASIAN
  • Related