R: loop or filter over the dataframe and stop when specific condition is met-CodePudding

I have a dataframe df, which looks like this:

id	status
601	2
601	2
601	2
601	4
601	2
601	2
601	4
601	2
990	2
990	4

First Output I want to have is: I want to use a loop to filter over the id and that it stops, when per the number 4 occurs the first time per id:

so I want that it looks like this at the end:

id	status
601	2
601	2
601	2
601	4
990	2
990	4

and the second output I want to have: It should stop with 4, no matter how often it occurs in the original dataset. After 4 nothing else should come.

id	status
601	2
601	2
601	2
601	4
601	2
601	2
601	4
990	2
990	4

I do not know how to do it? Maybe there is also a way with filtering? I would really apreciate your help

CodePudding user response：

If I understand you question correctly, you can use {dplyr} to get the first 4 rows of each id:

df %>% dplyr::group_by(id) %>% slice_head(n = 4)

How are your two questions different? Try adding some data that we can run and illustrate if the above is insufficient.

CodePudding user response：

To get the rows until the first 4, you can do:

library(dplyr)
df %>% 
  group_by(id) %>% 
  filter(!lag(cumany(status == 4), default = FALSE))

#     id status
#  <int>  <int>
#1   601      2
#2   601      2
#3   601      2
#4   601      4
#5   990      2
#6   990      4

And to get everything until the last 4, you can do:

df %>% 
  group_by(id) %>% 
  mutate(tmp = lag(cumsum(status == 4), default = FALSE)) %>% 
  filter(tmp < max(tmp) | tmp == 0) %>% 
  select(-tmp)

#      id status
# 1   601      2
# 2   601      2
# 3   601      2
# 4   601      4
# 5   601      2
# 6   601      2
# 7   601      4
# 8   990      2
# 9   990      4