For example, I have the following dataframe:
ID | variable | order |
---|---|---|
1 | a | 1 |
1 | b | 2 |
2 | b | 1 |
2 | a | 2 |
2 | b | 3 |
3 | b | 1 |
3 | a | 2 |
I would like to keep only the ID groups where "a" appears before "b" (i.e., the "order" of a is smaller than b). So the result would look something like this:
ID | variable | order |
---|---|---|
1 | a | 1 |
1 | b | 2 |
2 | b | 1 |
2 | a | 2 |
2 | b | 3 |
Where only ID 1 and 2 remain (with all of its original rows), and all rows in ID 3 are removed because the "order" of b is smaller than a. Would anyone have guidance on how this could be done in R?
CodePudding user response:
your_data %>%
group_by(ID) %>%
filter(any(variable == "a" & lead(variable, default = "not b") == "b"))
This ignores the order
column and is based on the order of rows within each ID group. It check for the presence of an "a"
on one row AND a "b"
on the immediate next row.
In your comment you say '"a" right before "b"' - I went with that "right before" clarification. If a group has values "a", "c", "b"
it would not be kept in my answer as the "a" is not "right before" the "b".