Home > OS >  Group by Consecutive Dates R
Group by Consecutive Dates R

Time:06-16

I want to group animals based on consecutive months they were found within the same burrow, but also split up those groups if the months were not consecutive.

#Input Data
burrow.data <- read.csv
   Animal Burrow       Date
1     027  B0961 2022-03-01
2     027  B0961 2022-04-26
3     033  1920  2021-11-02
4     033  1955  2022-03-29
5     033  1955  2022-04-26
6     063  B0540 2021-04-21
7     063  B0540 2022-01-04
8     063  B0540 2022-03-01
9     101  B0021 2020-11-23
10    101  B0021 2020-12-23
11    101  B0021 2021-11-04
12    101  B0021 2022-01-06
13    101  B0021 2022-02-04
14    101  B0021 2022-03-03

#Expected Output
 Animal Burrow grp Date.Start   Date.End
1    033  1920   1 2021-11-02 2021-11-02
2    033  1955   1 2022-03-29 2022-04-26
3    101  B0021  1 2020-11-23 2020-12-23
4    101  B0021  2 2022-01-06 2020-03-03
5    063  B0540  1 2021-04-21 2022-03-01
6    027  B0961  1 2022-03-01 2022-04-26

I used code from another post: Group consecutive dates in R

And wrote:
burrow.input <- burrow.data[order(burrow.data$Date),]
burrow.input$grp <- ave(as.integer(burrow.input$Date), burrow.input[-4], FUN = function(z) cumsum(c(TRUE, diff(z)>1)))
burrow.input

out <- aggregate(Date ~ Animal   Burrow   grp, data = burrow.input, FUN = function(z) setNames(range(z), c("Start", "End")))
out <- do.call(data.frame,out)

out[,4:5] <- lapply(out[,4:5], as.Date, origin = "1970-01-01")
out

The code keeps grouping 101 into a single group instead of two groups broken up by a date gap (See below). How can I fix this?

  Animal Burrow grp Date.Start   Date.End
1    033  1920   1 2021-11-02 2021-11-02
2    033  1955   1 2022-03-29 2022-04-26
3    101  B0021  1 2020-11-23 2022-03-03
4    063  B0540  1 2021-04-21 2022-03-01
5    027  B0961  1 2022-03-01 2022-04-26

CodePudding user response:

Group the data by Animal, Burrow and a grouping variable that changes each time the date jumps by more than 1 month. Here as.yearmon converts the date to a yearmon object which internally is a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec so multiply that by 12 and check whether the difference between it and the prior value is greater than 1. Take the cumulative sum of that to generate a grouping variable. Finally summarize that, sort and remove the grouping variable that was added.

library(dplyr)
library(zoo)

burrow.data %>%
  group_by(Animal, Burrow, 
    diff = cumsum( c(1, diff(12 * as.yearmon(Date)) > 1) ) ) %>%
  summarize(Date.start = first(Date), Date.end = last(Date), .groups = "drop") %>%
  arrange(Burrow) %>%
  select(-diff)

giving:

# A tibble: 7 × 4
  Animal Burrow Date.start Date.end  
   <int> <chr>  <chr>      <chr>     
1     33 1920   2021-11-02 2021-11-02
2     33 1955   2022-03-29 2022-04-26
3    101 B0021  2020-11-23 2021-11-04
4    101 B0021  2022-01-06 2022-03-03
5     63 B0540  2021-04-21 2022-01-04
6     63 B0540  2022-03-01 2022-03-01
7     27 B0961  2022-03-01 2022-04-26

Note

The input data in reproducible form is:

burrow.data <-
structure(list(Animal = c(27L, 27L, 33L, 33L, 33L, 63L, 63L, 
63L, 101L, 101L, 101L, 101L, 101L, 101L), Burrow = c("B0961", 
"B0961", "1920", "1955", "1955", "B0540", "B0540", "B0540", "B0021", 
"B0021", "B0021", "B0021", "B0021", "B0021"), Date = c("2022-03-01", 
"2022-04-26", "2021-11-02", "2022-03-29", "2022-04-26", "2021-04-21", 
"2022-01-04", "2022-03-01", "2020-11-23", "2020-12-23", "2021-11-04", 
"2022-01-06", "2022-02-04", "2022-03-03")), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", 
"11", "12", "13", "14"))
  • Related