I am working with an administrative data set where I am trying the filter for observations that includes at least one of multiple "diagnosis codes" of interest. The diagnosis codes range from 1-1000, and as an example I want to filter for observations with diagnosis codes 100, 101, 105.
The diagnosis codes are available across 5 columns/variables that include the pattern "ICD9". So as long as one of the columns have 100 or 101 or 105, then it satisfies the condition.
I have been unsuccessful where both of these codes below retrieve 0 observations.
new_data<- df%>%
filter(if_any(contains("ICD9"), ~str_detect(., pattern = "100 | 101 | 105")))
new_data<- df%>%
filter(if_any(contains("ICD9"), any_vars(. == "100" | . == "101" | . == "105")))
Any help is appreciated.
Thanks
CodePudding user response:
You can use the %in%
pipe in conjunction with filter(if_any(contains()))
, like so:
library(tidyverse)
# Some data
df <- data.frame(ICD9_1 = c("100", "101", "102", "103", "104", "105"),
ICD9_2 = c("105", "104", "103", "102", "101", "100"))
new_data<- df %>%
filter(if_any(contains("ICD9"), ~ . %in% c("100", "101", "105")))
new_data
ICD9_1 ICD9_2
1 100 105
2 101 104
3 104 101
4 105 100
CodePudding user response:
Simply fix syntax of regex by removing whitespaces around numbers:
df%>%
filter(if_any(contains("ICD9"), ~str_detect(., pattern = "100|101|105")))