Home > Software engineering >  Filter data based on subgroups R
Filter data based on subgroups R

Time:10-17

In reality it's much more complex, but let's say my data looks like this:

df <- data.frame(
      id = c(1,1,1,2,2,2,2,3,3,3),
      event = c(0,0,0,1,1,1,1,0,0,0),
      day = c(1,3,3,1,6,6,7,1,4,6),
      time = c("2016-10-25 14:00:00", "2016-10-27 12:00:15", "2016-10-27 15:30:00",
                "2016-10-23 11:00:00", "2016-10-28 08:00:15", "2016-10-28 23:00:00", "2016-10-29 12:00:00",
                "2016-10-24 15:00:00", "2016-10-27 15:00:15", "2016-10-29 16:00:00"))
df$time <- as.POSIXct(df$time)

Output:
   id event day                time
1   1     0   1 2016-10-25 14:00:00
2   1     0   3 2016-10-27 12:00:15
3   1     0   3 2016-10-27 15:30:00
4   2     1   1 2016-10-23 11:00:00
5   2     1   6 2016-10-28 08:00:15
6   2     1   6 2016-10-28 23:00:00
7   2     1   7 2016-10-29 12:00:00
8   3     0   1 2016-10-24 15:00:00
9   3     0   4 2016-10-27 15:00:15
10  3     0   6 2016-10-29 16:00:00

What I need to do:

If event is 0, I want to keep only the last 24 hours per id. If event is 1, I want to keep the 6th day.

I know how to keep the last 24 hours in general:

library(lubridate)

last_twentyfour_hours <- df %>%                                      
  group_by(id) %>%                                                             
  filter(time > last(time) - hours(24))

But how do i filter differently for each group?

Thank you very much in advance!

CodePudding user response:

Grouped by 'id', 'event', do a filter with if/else i.e. if 0 is in 'event', then use the OP's condition or else return the rows where 'day' is 6

library(dplyr)
library(lubridate)
df %>% 
   group_by(id, event) %>% 
   filter(if(0 %in% event) time > last(time) - hours(24) else 
        day == 6) %>% 
   ungroup

-output

# A tibble: 5 × 4
     id event   day time               
  <dbl> <dbl> <dbl> <dttm>             
1     1     0     3 2016-10-27 12:00:15
2     1     0     3 2016-10-27 15:30:00
3     2     1     6 2016-10-28 08:00:15
4     2     1     6 2016-10-28 23:00:00
5     3     0     6 2016-10-29 16:00:00

CodePudding user response:

We could use the & and | operator:

df %>%                                      
  group_by(id) %>%                                                             
  filter(event == 0 & time > last(time) - hours(24) |
           event == 1 & day==6)
     id event   day time               
  <dbl> <dbl> <dbl> <dttm>             
1     1     0     3 2016-10-27 12:00:15
2     1     0     3 2016-10-27 15:30:00
3     2     1     6 2016-10-28 08:00:15
4     2     1     6 2016-10-28 23:00:00
5     3     0     6 2016-10-29 16:00:00
  • Related