I have a dataframe with columns that start with "dx". I want to create another column called primary
. If the value in all of the columns that start with "dx" is "I629"
or NA
, I want the value of primary
to be "Unspecified"
. Otherwise, I want the it to be the first non-"I629"
value.
My desired output:
dx1 dx2 dx3 dx4 dx5 primary
I629 NA NA NA NA Unspecified
S065 NA NA NA NA S065
I629 S066 NA NA NA S066
I629 I629 NA NA NA Unspecified
CodePudding user response:
A tidyverse solution: create a helper dataframe where "I629"
is replaced with NA
across all Dx
columns; use dplyr::coalesce()
to take the first non-NA
value (or "Unspecified"
if all NA
); and finally bind the new primary
column to your original dataframe.
library(dplyr)
library(tidyr)
primary_dx <- dat %>%
mutate(
across(starts_with("dx"), \(col) na_if(col, "I629")),
primary = coalesce(!!!select(., starts_with("dx")), "Unspecified")
) %>%
select(primary)
bind_cols(dat, primary_dx)
# A tibble: 4 × 6
dx1 dx2 dx3 dx4 dx5 primary
<chr> <chr> <lgl> <lgl> <lgl> <chr>
1 I629 NA NA NA NA Unspecified
2 S065 NA NA NA NA S065
3 I629 S066 NA NA NA S066
4 I629 I629 NA NA NA Unspecified
CodePudding user response:
data <- data.frame(dx1 = c("I629", "S065", "I629", "I629"), dx2 = c(NA,
NA, "S066", "I629"), dx3 = c(NA, NA, NA, NA), dx4 = c(NA, NA,
NA, NA), dx5 = c(NA, NA, NA, NA))
data$primary <- apply(data, 1, \(x)
if (all(is.na(x) | all(x=="I629" , na.rm = T) )) "Unspecified"
else x[which(x != "I629")[1]])
data
#> dx1 dx2 dx3 dx4 dx5 primary
#> 1 I629 <NA> NA NA NA Unspecified
#> 2 S065 <NA> NA NA NA S065
#> 3 I629 S066 NA NA NA S066
#> 4 I629 I629 NA NA NA Unspecified
CodePudding user response:
You could try, which uses apply
to look across rows to see if all 5 columns contain either NA
or I629
, then uses a simple ifelse
to categorize the outcome:
df$primary <- apply(df, 1, function(x) ifelse(sum(x == "I629" | is.na(x)) == 5, "Unspecified", x[1]))
Output:
# dx1 dx2 dx3 dx4 dx5 primary
# 1 I629 <NA> NA NA NA Unspecified
# 2 S065 <NA> NA NA NA S065
# 3 I629 S066 NA NA NA I629
# 4 I629 I629 NA NA NA Unspecified
Data
df <- read.table(text = "dx1 dx2 dx3 dx4 dx5
I629 NA NA NA NA
S065 NA NA NA NA
I629 S066 NA NA NA
I629 I629 NA NA NA", header = TRUE)