Home > front end >  filtering on multiple columns with multiple conditions
filtering on multiple columns with multiple conditions

Time:05-05

I am working with an administrative data set where I am trying the filter for observations that includes at least one of multiple "diagnosis codes" of interest. The diagnosis codes range from 1-1000, and as an example I want to filter for observations with diagnosis codes 100, 101, 105.

The diagnosis codes are available across 5 columns/variables that include the pattern "ICD9". So as long as one of the columns have 100 or 101 or 105, then it satisfies the condition.

I have been unsuccessful where both of these codes below retrieve 0 observations.

 new_data<- df%>%
  filter(if_any(contains("ICD9"), ~str_detect(., pattern = "100 | 101 | 105")))


new_data<- df%>% 
  filter(if_any(contains("ICD9"), any_vars(. == "100" | . == "101" | . == "105")))

Any help is appreciated.

Thanks

CodePudding user response:

You can use the %in% pipe in conjunction with filter(if_any(contains())), like so:

library(tidyverse)

# Some data
df <- data.frame(ICD9_1 = c("100", "101", "102", "103", "104", "105"),
             ICD9_2 = c("105", "104", "103", "102", "101", "100")) 


new_data<- df %>% 
  filter(if_any(contains("ICD9"), ~ . %in% c("100", "101", "105")))

new_data

  ICD9_1 ICD9_2
1    100    105
2    101    104
3    104    101
4    105    100

CodePudding user response:

Simply fix syntax of regex by removing whitespaces around numbers:

df%>%
  filter(if_any(contains("ICD9"), ~str_detect(., pattern = "100|101|105")))
  • Related