Home > Back-end >  Is there a R function for conditional values across different columns?
Is there a R function for conditional values across different columns?

Time:04-16

Suppose you have a dataframe that looks something like this:

df <- tibble(PatientID = c(1,2,3,4,5),
         Treat1 = c("R", "O", "C", "O", "C"),
         Treat2 = c("O", "R", "R", NA, "O"),
         Treat3 = c("C", NA, "O", NA, "R"),
         Treat4 = c("H", NA, "H", NA, "H"),
         Treat5 = c("H", NA, NA, NA, "H"))

Treat 1:Treat5 are different treatments that a patient has had. I'm looking to create a new variable "Chemo" with 1 for yes, 0 for no based on whether a patient has had treatment "C".

I've been using if_else(), but as I have 10 different treatment variables in my actual dataset, and I would like to create such a column per treatment, i wonder if I can do it without writing such long if statements. Is there an easier way to do this?

CodePudding user response:

Another option using str_detect and any to determine if C occurs in any of the Treat columns for each row. The converts the logical to an integer.

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(Chemo =  any(str_detect(c_across(starts_with("Treat")), "C"), na.rm = TRUE)) %>%
  ungroup

Output

  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1

CodePudding user response:

Use if_any to loop over the columns that starts_with 'Treat', create a logical vector with %in% - if_any returns TRUE/FALSE if any of the columns selected have 'C' for a particular row, the logical is converted to binary with (or as.integer)

library(dplyr)
df <- df %>% 
   mutate(Chemo =  (if_any(starts_with("Treat"), ~ .x %in% "C")))

-output

df
# A tibble: 5 × 7
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <int>
1         1 R      O      C      H      H          1
2         2 O      R      <NA>   <NA>   <NA>       0
3         3 C      R      O      H      <NA>       1
4         4 O      <NA>   <NA>   <NA>   <NA>       0
5         5 C      O      R      H      H          1

Or using base R with rowSums

df$Chemo <-  (rowSums(df[startsWith(names(df), "Treat")] == "C", 
      na.rm = TRUE) > 0)

CodePudding user response:

An alternative dplyr way:

library(dplyr)

df %>% 
  mutate(across(starts_with("Treat"), ~case_when(.=="C" ~1,
                                                 TRUE ~0), .names = 'new_{col}')) %>%
  mutate(Chemo = rowSums(select(., starts_with("new")))) %>% 
  select(-starts_with("new"))
  PatientID Treat1 Treat2 Treat3 Treat4 Treat5 Chemo
      <dbl> <chr>  <chr>  <chr>  <chr>  <chr>  <dbl>
1         1 R      O      C      H      H          1
2         2 O      R      NA     NA     NA         0
3         3 C      R      O      H      NA         1
4         4 O      NA     NA     NA     NA         0
5         5 C      O      R      H      H          1
  • Related