Home > Blockchain >  Subset data from the latest date prior to a specified date per group
Subset data from the latest date prior to a specified date per group

Time:06-24

My data are as follows:

df <- structure(list(site = c("A", "A", "A", "A", "B", "B", "B", "B", 
"A", "A", "A", "A", "B", "B", "B", "B"), year = c(2019L, 2019L, 
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2020L, 2020L, 2020L, 
2020L, 2020L, 2020L, 2020L, 2020L), date = c("2019-08-12", "2019-08-14", 
"2019-10-16", "2019-10-27", "2019-08-10", "2019-08-13", "2019-10-01", 
"2019-10-16", "2020-08-08", "2020-08-11", "2020-10-14", "2020-10-16", 
"2020-08-07", "2020-08-19", "2020-10-15", "2020-10-20")), 
class = "data.frame", row.names = c(NA, 
-16L))

For each site and each year, I would like to subset a date range with the following conditions:

The range will begin from the latest date prior to August 15th, and will end at the earliest date after October 15th.

Thank you in advance!

CodePudding user response:

Grouped by 'site', year', filter the 'date' that are between the date value that is less than August 15, and the 'date' value just greater than the October 15th. date[date < as.Date(ISOdate(year, 8, 15))], subsets the dates which are less than the August 15, wrapping with max returns the latest subset date. Similarly for the earliest date is calculated from October 15th.

library(dplyr)
df %>% 
  mutate(date = as.Date(date)) %>%
  group_by(site, year) %>% 
  filter(between(date, max(date[date < as.Date(ISOdate(year, 8, 15))]), 
     min(date[date > as.Date(ISOdate(year, 10, 15))]))) %>%
  ungroup
  •  Tags:  
  • r
  • Related