Using R, I would like to select the last rows within the same IDs for longitudinal data. However, I would like to keep 2-3 last rows within the same IDs when values in the time column are the same (e.g., value 5 for ID 1 and value 4 for ID 3) for the last rows (2 rows for ID 1 and 3 rows for ID 3). If the values are different in the time column within the same IDs, I want to keep the last row only (e.g., value 7 for ID 2).
My dataframe is as follows:
id time dx code
1 1 primary A1
1 5 primary D2
1 5 secondary B3
2 1 primary A2
2 7 primary C4
3 4 primary A1
3 4 secondary B3
3 4 tertiary D2
I want the following results:
id time dx code
1 5 primary D2
1 5 secondary B3
2 7 primary C4
3 4 primary A1
3 4 secondary B3
3 4 tertiary D2
When I used the following R scripts, d %>% group_by(id) %>% filter(row_number() == n())
, these only kept the last row within each ID. Any help would be appreciated!
CodePudding user response:
You can group_by
dx
as well and use slice_tail
:
dat %>%
group_by(id, dx) %>%
slice_tail(n = 1)
# A tibble: 6 x 4
# Groups: id, dx [6]
id time dx code
<int> <int> <chr> <chr>
1 1 5 primary D2
2 1 5 secondary B3
3 2 7 primary C4
4 3 4 primary A1
5 3 4 secondary B3
6 3 4 tertiary D2