Say that I have a df
.
I want to get the id
with two conditions at the same time:
the
id
'scode
should contaions a capitalI
, regardless of the number that follows it. For exampleI11
,I31
...the
id
'scode
should contaions specificcode
:E12
.
On the example below, the filtered id
should be id = 1
and id = 2
. Because they all contain I
and E12
.
Same id
in the example means in the same group.
structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 4), diag = c("main", "other", "main",
"other", "main", "other", "main", "other", "main", "other", "main",
"other", "main", "other", "main", "other", "main", "other", "main",
"other", "main", "other"), code = c("I11", "E12", "I11", "Q34",
"I31", "C33", "E12", "I34", "E12", "I45", "E12", "Z11", "E13",
"Z12", "E14", "Z13", "I25", "E1", "I25", "E2", "I25", "E3")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -22L), groups = structure(list(
id = c(1, 2, 3, 4), .rows = structure(list(1:6, 7:10, 11:16,
17:22), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))
> df
# A tibble: 22 × 3
# Groups: id [4]
id diag code
<dbl> <chr> <chr>
1 1 main I11
2 1 other E12
3 1 main I11
4 1 other Q34
5 1 main I31
6 1 other C33
7 2 main E12
8 2 other I34
9 2 main E12
10 2 other I45
# … with 12 more rows
CodePudding user response:
You could do:
df |>
group_by(id) |>
filter(TRUE %in% str_detect(code, "I") & TRUE %in% (code == "E12")) |>
ungroup()
Output:
# A tibble: 10 × 3
id diag code
<dbl> <chr> <chr>
1 1 main I11
2 1 other E12
3 1 main I11
4 1 other Q34
5 1 main I31
6 1 other C33
7 2 main E12
8 2 other I34
9 2 main E12
10 2 other I45
Or if you just want the groups add distinct(id)
after the filter(...)
:
# A tibble: 2 × 1
id
<dbl>
1 1
2 2
CodePudding user response:
To clarify, you want all records that are Ixx OR "E12"? Your 'at the same time' threw me off a little. If this is what you mean this should get your results. First, put your data in a dataframe df, then using library(tidyverse):
df %>% filter(grepl("^I",code) | code == "E12")
meaning filter records where column code contains I OR records where code equals E12.
CodePudding user response:
Your question suggests[1] that you're mostly interested in the id
values that fulfil both of your conditions; in which case, you don't really need to work on a data frame, but only on vectors:
intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"])
You create two string vectors of df$id
s, one for each condition, then extract their shared terms (this also removes duplicates, so the output is 1 2
).
This requires nothing more than base R and I suspect is much more efficient than any table-based approach (especially if grouping or pivots are involved).
[1] and if you're not, you can still use the line above to index a table row-wise, e.g.
df[df$id %in% intersect(df$id[grepl("^I", df$code)], df$id[df$code=="E12"]), ]
This will return all the rows of df
with an id
of 1 or 2 (including, however, those rows that have a matching id
but a different code
, e.g. "Q34").