I have clinical data with the medications participants are using, and I want to create new binary variables with medication categories (e.g., statin use). To do this I want to search for a set of strings (medication names) in multiple columns (medication1, medication2, etc.) to define the new variables.
Given the following code:
library(tidyverse)
ID <- sprintf("User % d", 1:4)
med1 <- c("rosuvastatin", "ezetimibe", "insulin", "Lipitor")
med2 <- c("niacin", "insulin", "simvastatin", NA)
df <- data.frame(ID, med1, med2)
df <- df%>%
mutate(use_statin = case_when(if_any(starts_with("med"), ~ str_detect(., pattern = "statin")) ~ 1))%>%
mutate(use_statin = case_when(if_any(starts_with("med"), ~ str_detect(., pattern = "Lipitor")) ~ 1))
df$use_statin
I am hoping the use_statin column would display "1 NA 1 1", but instead is displays "NA NA NA 1". It appears that the second mutate line of code overwrites the first.
CodePudding user response:
We could use a single if_any
with pattern
matching either one of them as |
(OR
) so that it won't override the first match
library(dplyr)
library(stringr)
df %>%
mutate(use_statin = case_when(if_any(starts_with("med"),
~ str_detect(.x, pattern = "statin|Lipitor"))~ 1))
-output
ID med1 med2 use_statin
1 User 1 rosuvastatin niacin 1
2 User 2 ezetimibe insulin NA
3 User 3 insulin simvastatin 1
4 User 4 Lipitor <NA> 1
In the OP's code, use_statin
column was created with the statin
match first and then overrided the output with Lipitor
match. Instead we may need an |
with the original column
df%>%
mutate(use_statin = case_when(if_any(starts_with("med"),
~ str_detect(., pattern = "statin")) ~ 1))%>%
mutate(use_statin = (case_when(if_any(starts_with("med"),
~ str_detect(., pattern = "Lipitor")) ~ 1)|use_statin))
-output
ID med1 med2 use_statin
1 User 1 rosuvastatin niacin 1
2 User 2 ezetimibe insulin NA
3 User 3 insulin simvastatin 1
4 User 4 Lipitor <NA> 1