Home > database >  Mutate new variable based on whether a set of strings is present in multiple columns in R
Mutate new variable based on whether a set of strings is present in multiple columns in R

Time:01-13

I have clinical data with the medications participants are using, and I want to create new binary variables with medication categories (e.g., statin use). To do this I want to search for a set of strings (medication names) in multiple columns (medication1, medication2, etc.) to define the new variables.

Given the following code:

library(tidyverse)
ID <- sprintf("User % d", 1:4) 
med1 <- c("rosuvastatin", "ezetimibe", "insulin", "Lipitor")
med2 <- c("niacin", "insulin", "simvastatin", NA)
df <- data.frame(ID, med1, med2)

df <- df%>%
  mutate(use_statin = case_when(if_any(starts_with("med"), ~ str_detect(., pattern = "statin")) ~ 1))%>%
  mutate(use_statin = case_when(if_any(starts_with("med"), ~ str_detect(., pattern = "Lipitor")) ~ 1))
df$use_statin

I am hoping the use_statin column would display "1 NA 1 1", but instead is displays "NA NA NA 1". It appears that the second mutate line of code overwrites the first.

CodePudding user response:

We could use a single if_any with pattern matching either one of them as | (OR) so that it won't override the first match

library(dplyr)
library(stringr)
df %>% 
 mutate(use_statin = case_when(if_any(starts_with("med"),
       ~ str_detect(.x, pattern = "statin|Lipitor"))~ 1))

-output

        ID         med1        med2 use_statin
1 User  1 rosuvastatin      niacin          1
2 User  2    ezetimibe     insulin         NA
3 User  3      insulin simvastatin          1
4 User  4      Lipitor        <NA>          1

In the OP's code, use_statin column was created with the statin match first and then overrided the output with Lipitor match. Instead we may need an | with the original column

df%>%
  mutate(use_statin = case_when(if_any(starts_with("med"),
   ~ str_detect(., pattern = "statin")) ~ 1))%>%
  mutate(use_statin =  (case_when(if_any(starts_with("med"), 
  ~ str_detect(., pattern = "Lipitor")) ~ 1)|use_statin))

-output

       ID         med1        med2 use_statin
1 User  1 rosuvastatin      niacin          1
2 User  2    ezetimibe     insulin         NA
3 User  3      insulin simvastatin          1
4 User  4      Lipitor        <NA>          1
  • Related