How can I get rows with specific conditions in r-CodePudding

Say that I have a df.

I want to get the id with two conditions at the same time:

the id's code should contaions a capital I, regardless of the number that follows it. For example I11, I31...
the id's code should contaions specific code: E12.

On the example below, the filtered id should be id = 1 and id = 2. Because they all contain I and E12.

Same id in the example means in the same group.

structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 
3, 3, 3, 4, 4, 4, 4, 4, 4), diag = c("main", "other", "main", 
"other", "main", "other", "main", "other", "main", "other", "main", 
"other", "main", "other", "main", "other", "main", "other", "main", 
"other", "main", "other"), code = c("I11", "E12", "I11", "Q34", 
"I31", "C33", "E12", "I34", "E12", "I45", "E12", "Z11", "E13", 
"Z12", "E14", "Z13", "I25", "E1", "I25", "E2", "I25", "E3")), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -22L), groups = structure(list(
    id = c(1, 2, 3, 4), .rows = structure(list(1:6, 7:10, 11:16, 
        17:22), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))

> df
# A tibble: 22 × 3
# Groups:   id [4]
      id diag  code 
   <dbl> <chr> <chr>
 1     1 main  I11  
 2     1 other E12  
 3     1 main  I11  
 4     1 other Q34  
 5     1 main  I31  
 6     1 other C33  
 7     2 main  E12  
 8     2 other I34  
 9     2 main  E12  
10     2 other I45  
# … with 12 more rows

CodePudding user response：

You could do:

df |>
  group_by(id) |>
  filter(TRUE %in% str_detect(code, "I") & TRUE %in% (code == "E12")) |>
  ungroup()

Output:

# A tibble: 10 × 3
      id diag  code 
   <dbl> <chr> <chr>
 1     1 main  I11  
 2     1 other E12  
 3     1 main  I11  
 4     1 other Q34  
 5     1 main  I31  
 6     1 other C33  
 7     2 main  E12  
 8     2 other I34  
 9     2 main  E12  
10     2 other I45

Or if you just want the groups add distinct(id) after the filter(...):

# A tibble: 2 × 1
     id
  <dbl>
1     1
2     2

CodePudding user response：

To clarify, you want all records that are Ixx OR "E12"? Your 'at the same time' threw me off a little. If this is what you mean this should get your results. First, put your data in a dataframe df, then using library(tidyverse):

df %>% filter(grepl("^I",code) | code == "E12")

meaning filter records where column code contains I OR records where code equals E12.

CodePudding user response：

Your question suggests^[1] that you're mostly interested in the id values that fulfil both of your conditions; in which case, you don't really need to work on a data frame, but only on vectors:

intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"])

You create two string vectors of df$ids, one for each condition, then extract their shared terms (this also removes duplicates, so the output is 1 2).

This requires nothing more than base R and I suspect is much more efficient than any table-based approach (especially if grouping or pivots are involved).

[1] and if you're not, you can still use the line above to index a table row-wise, e.g.

df[df$id %in% intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"]), ]

This will return all the rows of df with an id of 1 or 2 (including, however, those rows that have a matching id but a different code, e.g. "Q34").