Home > Back-end >  R: loop over the dataframe and stop when specific condition is met
R: loop over the dataframe and stop when specific condition is met

Time:01-25

I have a dataframe df, which looks like this:

id status
601 2
601 2
601 2
601 4
601 2
601 2
601 4
601 2
990 2
990 4

First Output I want to have is: I want to use a loop to filter over the id and that it stops, when per the number 4 occurs the first time per id:

so I want that it looks like this at the end:

id status
601 2
601 2
601 2
601 4
990 2
990 4

and the second output I want to have: It should stop with 4, no matter how often it occurs in the original dataset. After 4 nothing else should come.

id status
601 2
601 2
601 2
601 4
601 2
601 2
601 4
990 2
990 4

I do not know how to do it? Maybe there is also a way with filtering? I would really apreciate your help

CodePudding user response:

If I understand you question correctly, you can use {dplyr} to get the first 4 rows of each id:

df %>% dplyr::group_by(id) %>% slice_head(n = 4) 

How are your two questions different? Try adding some data that we can run and illustrate if the above is insufficient.

CodePudding user response:

To get the rows until the first 4, you can do:

library(dplyr)
df %>% 
  group_by(id) %>% 
  filter(!lag(cumany(status == 4), default = FALSE))

#     id status
#  <int>  <int>
#1   601      2
#2   601      2
#3   601      2
#4   601      4
#5   990      2
#6   990      4

And to get everything until the last 4, you can do:

df %>% 
  group_by(id) %>% 
  mutate(tmp = lag(cumsum(status == 4), default = FALSE)) %>% 
  filter(tmp < max(tmp) | tmp == 0) %>% 
  select(-tmp)

#      id status
# 1   601      2
# 2   601      2
# 3   601      2
# 4   601      4
# 5   601      2
# 6   601      2
# 7   601      4
# 8   990      2
# 9   990      4
  • Related