Home > front end >  How to select group based on the order of rows within group in R
How to select group based on the order of rows within group in R

Time:03-15

For example, I have the following dataframe:

ID variable order
1 a 1
1 b 2
2 b 1
2 a 2
2 b 3
3 b 1
3 a 2

I would like to keep only the ID groups where "a" appears before "b" (i.e., the "order" of a is smaller than b). So the result would look something like this:

ID variable order
1 a 1
1 b 2
2 b 1
2 a 2
2 b 3

Where only ID 1 and 2 remain (with all of its original rows), and all rows in ID 3 are removed because the "order" of b is smaller than a. Would anyone have guidance on how this could be done in R?

CodePudding user response:

your_data %>%
  group_by(ID) %>%
  filter(any(variable == "a" & lead(variable, default = "not b") == "b"))

This ignores the order column and is based on the order of rows within each ID group. It check for the presence of an "a" on one row AND a "b" on the immediate next row.

In your comment you say '"a" right before "b"' - I went with that "right before" clarification. If a group has values "a", "c", "b" it would not be kept in my answer as the "a" is not "right before" the "b".

  • Related