Home > OS >  set column value to first value meeting conditions across columns
set column value to first value meeting conditions across columns

Time:12-29

I have a dataframe with columns that start with "dx". I want to create another column called primary. If the value in all of the columns that start with "dx" is "I629" or NA, I want the value of primary to be "Unspecified". Otherwise, I want the it to be the first non-"I629" value.

My desired output:

dx1  dx2  dx3 dx4 dx5 primary
I629 NA   NA  NA  NA  Unspecified
S065 NA   NA  NA  NA  S065
I629 S066 NA  NA  NA  S066
I629 I629 NA  NA  NA  Unspecified

CodePudding user response:

A tidyverse solution: create a helper dataframe where "I629" is replaced with NA across all Dx columns; use dplyr::coalesce() to take the first non-NA value (or "Unspecified" if all NA); and finally bind the new primary column to your original dataframe.

library(dplyr)
library(tidyr)

primary_dx <- dat %>% 
  mutate(
    across(starts_with("dx"), \(col) na_if(col, "I629")),
    primary = coalesce(!!!select(., starts_with("dx")), "Unspecified")
  ) %>% 
  select(primary)

bind_cols(dat, primary_dx)
# A tibble: 4 × 6
  dx1   dx2   dx3   dx4   dx5   primary    
  <chr> <chr> <lgl> <lgl> <lgl> <chr>      
1 I629  NA    NA    NA    NA    Unspecified
2 S065  NA    NA    NA    NA    S065       
3 I629  S066  NA    NA    NA    S066       
4 I629  I629  NA    NA    NA    Unspecified

CodePudding user response:

data <- data.frame(dx1 = c("I629", "S065", "I629", "I629"), dx2 = c(NA, 
NA, "S066", "I629"), dx3 = c(NA, NA, NA, NA), dx4 = c(NA, NA, 
NA, NA), dx5 = c(NA, NA, NA, NA))


data$primary <- apply(data, 1, \(x)
 if (all(is.na(x) | all(x=="I629" , na.rm = T) )) "Unspecified" 
 else x[which(x != "I629")[1]])

data
#>    dx1  dx2 dx3 dx4 dx5     primary
#> 1 I629 <NA>  NA  NA  NA Unspecified
#> 2 S065 <NA>  NA  NA  NA        S065
#> 3 I629 S066  NA  NA  NA        S066
#> 4 I629 I629  NA  NA  NA Unspecified

CodePudding user response:

You could try, which uses apply to look across rows to see if all 5 columns contain either NA or I629, then uses a simple ifelse to categorize the outcome:

df$primary <- apply(df, 1, function(x) ifelse(sum(x == "I629" | is.na(x)) == 5, "Unspecified", x[1]))

Output:

#    dx1  dx2 dx3 dx4 dx5     primary
# 1 I629 <NA>  NA  NA  NA Unspecified
# 2 S065 <NA>  NA  NA  NA        S065
# 3 I629 S066  NA  NA  NA        I629
# 4 I629 I629  NA  NA  NA Unspecified

Data

df <- read.table(text = "dx1   dx2   dx3   dx4   dx5
I629 NA    NA    NA    NA
S065 NA    NA    NA    NA
I629 S066  NA    NA    NA
I629 I629 NA   NA    NA", header = TRUE)
  • Related