Home > Blockchain >  merge dataframes based on date range in R
merge dataframes based on date range in R

Time:06-10

I have two dataframes that I need to merge based on an ID and date ranges. df1 has columns SiteID, and date and df2 has columns SiteID, Start Date, and End Date. I'm looking to end up with a df that includes SiteID, Date, Start Date and End Date.

example df1:

            DateTime SiteID
1 2010-07-25 01:06:55    B04
2 2011-05-10 23:52:14    B04
3 2011-09-17 01:14:30    B04
4 2012-04-04 02:55:29    B05
5 2013-01-05 23:03:06    B05
6 2011-03-09 20:39:46    B06
7 2012-07-25 23:17:19    B07
8 2011-03-03 00:46:45    B08

example df2:

  Site.ID Start.date   End.date
1      B04 2010-07-18 2010-08-24
2      B04 2011-02-22 2011-07-23
3      B04 2011-08-30 2012-10-03
4      B04 2012-10-20 2013-04-08
5      B05 2011-08-30 2012-08-21
6      B05 2012-12-08 2013-01-21
7      B05 2013-02-08 2013-04-08
8      B06 2010-07-20 2010-09-03
9      B06 2011-02-12 2011-04-18
10     B06 2011-05-13 2011-05-16
11     B07 2011-10-24 2011-11-29
12     B07 2011-12-29 2012-12-02
13     B08 2011-02-12 2011-04-01
14     B08 2011-10-24 2011-12-24

what I'm looking to create:

        DateTime SiteID Start.date   End.date
1 2010-07-25 01:06:55    B04 2010-07-18 2010-08-24
2 2011-05-10 23:52:14    B04 2011-02-22 2011-07-23
3 2011-09-17 01:14:30    B04 2011-08-30 2012-10-03
4 2012-04-04 02:55:29    B05 2011-08-30 2012-08-21
5 2013-01-05 23:03:06    B05 2012-12-08 2013-01-21
6 2011-03-09 20:39:46    B06 2011-02-12 2011-04-18
7 2012-07-25 23:17:19    B07 2011-12-29 2012-12-02
8 2011-03-03 00:46:45    B08 2011-02-12 2011-04-01

CodePudding user response:

Assume you are using lubridate for the DateTime, you can do

df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>% 
  filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#>   DateTime            SiteID Start.date End.date  
#>   <dttm>              <chr>  <date>     <date>    
#> 1 2010-07-25 01:06:55 B04    2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04    2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04    2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05    2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05    2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06    2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07    2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08    2011-02-12 2011-04-01

Full Example:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union


df1 <- tribble(~DateTime, ~SiteID,
"2010-07-25 01:06:55", "B04",
"2011-05-10 23:52:14", "B04",
"2011-09-17 01:14:30", "B04",
"2012-04-04 02:55:29", "B05",
"2013-01-05 23:03:06", "B05",
"2011-03-09 20:39:46", "B06",
"2012-07-25 23:17:19", "B07",
"2011-03-03 00:46:45", "B08") %>% 
  mutate(DateTime = as_datetime(DateTime))


df2 <- tribble(~Site.ID, ~Start.date, ~End.date,
"B04", "2010-07-18", "2010-08-24",
"B04", "2011-02-22", "2011-07-23",
"B04", "2011-08-30", "2012-10-03",
"B04", "2012-10-20", "2013-04-08",
"B05", "2011-08-30", "2012-08-21",
"B05", "2012-12-08", "2013-01-21",
"B05", "2013-02-08", "2013-04-08",
"B06", "2010-07-20", "2010-09-03",
"B06", "2011-02-12", "2011-04-18",
"B06", "2011-05-13", "2011-05-16",
"B07", "2011-10-24", "2011-11-29",
"B07", "2011-12-29", "2012-12-02",
"B08", "2011-02-12", "2011-04-01",
"B08", "2011-10-24", "2011-12-24") %>% 
  mutate(Start.date = as_date(Start.date), End.date = as_date(End.date))


df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>% 
  filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#>   DateTime            SiteID Start.date End.date  
#>   <dttm>              <chr>  <date>     <date>    
#> 1 2010-07-25 01:06:55 B04    2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04    2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04    2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05    2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05    2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06    2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07    2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08    2011-02-12 2011-04-01

Created on 2022-06-09 by the reprex package (v2.0.1)

  • Related