Home > Enterprise >  R get date of first and last element in group
R get date of first and last element in group

Time:10-13

I have a dataframe with a time series of events:

               Date    Element     Event
2020-01-01 00:00:00          A        OK
2020-01-01 00:05:00          A        OK
2020-01-01 00:10:00          A        OK
2020-01-01 00:15:00          A     Alarm
2020-01-01 00:20:00          A     Alarm
2020-01-01 00:25:00          A     Alarm
2020-01-01 00:30:00          A        OK
2020-01-01 00:00:00          B        OK
2020-01-01 00:05:00          B        OK
2020-01-01 00:10:00          B     Alarm
2020-01-01 00:15:00          B     Alarm
2020-01-01 00:20:00          B     Alarm
2020-01-01 00:25:00          B        OK
2020-01-01 00:30:00          B        OK

I'd like to group it by day, element and event, extracting the first and last date of each group, getting this structure:

       Day    Element     Event                  Begin                    End
2020-01-01          A        OK    2020-01-01 00:00:00    2020-01-01 00:10:00
2020-01-01          A     Alarm    2020-01-01 00:15:00    2020-01-01 00:25:00
2020-01-01          A        OK    2020-01-01 00:30:00    2020-01-01 00:30:00
2020-01-01          B        OK    2020-01-01 00:00:00    2020-01-01 00:05:00
2020-01-01          B     Alarm    2020-01-01 00:10:00    2020-01-01 00:20:00
2020-01-01          B        OK    2020-01-01 00:25:00    2020-01-01 00:30:00

Test data:

structure(list(Date = structure(c(1577836800, 1577837100,1577837400, 1577837700, 1577838000, 1577838300, 1577838600, 1577836800, 1577837100, 
1577837400, 1577837700, 1577838000, 1577838300, 1577838600), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Element = c("A", "A", "A", "A", "A", 
"A", "A", "B", "B", "B", "B", "B", "B", "B"), Event = c("OK", 
"OK", "OK", "Alarm", "Alarm", "Alarm", "OK", "OK", "OK", "Alarm", 
"Alarm", "Alarm", "OK", "OK")), row.names = c(NA, -14L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

using dplyr and data.table:

df %>%
  mutate(Day = as.Date(Date)) %>%
  group_by(grp = rleid(Day, Element, Event), Day, Element, Event) %>%
  summarize(Begin = min(Date),
            End = max(Date)) %>%
  ungroup() %>%
  select(-grp)

Which results in:

# A tibble: 6 x 5
  Day        Element Event Begin               End                
  <date>     <chr>   <chr> <chr>               <chr>              
1 2020-01-01 A       OK    2020-01-01 00:00:00 2020-01-01 00:10:00
2 2020-01-01 A       Alarm 2020-01-01 00:15:00 2020-01-01 00:25:00
3 2020-01-01 A       OK    2020-01-01 00:30:00 2020-01-01 00:30:00
4 2020-01-01 B       OK    2020-01-01 00:00:00 2020-01-01 00:05:00
5 2020-01-01 B       Alarm 2020-01-01 00:10:00 2020-01-01 00:20:00
6 2020-01-01 B       OK    2020-01-01 00:25:00 2020-01-01 00:30:00
  • Related