I have a dataframe df, which looks like this:
id | status |
---|---|
601 | 2 |
601 | 2 |
601 | 2 |
601 | 4 |
601 | 2 |
601 | 2 |
601 | 4 |
601 | 2 |
990 | 2 |
990 | 4 |
First Output I want to have is: I want to use a loop to filter over the id and that it stops, when per the number 4 occurs the first time per id:
so I want that it looks like this at the end:
id | status |
---|---|
601 | 2 |
601 | 2 |
601 | 2 |
601 | 4 |
990 | 2 |
990 | 4 |
and the second output I want to have: It should stop with 4, no matter how often it occurs in the original dataset. After 4 nothing else should come.
id | status |
---|---|
601 | 2 |
601 | 2 |
601 | 2 |
601 | 4 |
601 | 2 |
601 | 2 |
601 | 4 |
990 | 2 |
990 | 4 |
I do not know how to do it? Maybe there is also a way with filtering? I would really apreciate your help
CodePudding user response:
If I understand you question correctly, you can use {dplyr} to get the first 4 rows of each id
:
df %>% dplyr::group_by(id) %>% slice_head(n = 4)
How are your two questions different? Try adding some data that we can run and illustrate if the above is insufficient.
CodePudding user response:
To get the rows until the first 4, you can do:
library(dplyr)
df %>%
group_by(id) %>%
filter(!lag(cumany(status == 4), default = FALSE))
# id status
# <int> <int>
#1 601 2
#2 601 2
#3 601 2
#4 601 4
#5 990 2
#6 990 4
And to get everything until the last 4, you can do:
df %>%
group_by(id) %>%
mutate(tmp = lag(cumsum(status == 4), default = FALSE)) %>%
filter(tmp < max(tmp) | tmp == 0) %>%
select(-tmp)
# id status
# 1 601 2
# 2 601 2
# 3 601 2
# 4 601 4
# 5 601 2
# 6 601 2
# 7 601 4
# 8 990 2
# 9 990 4