I have two dataframes that I need to merge based on an ID and date ranges. df1 has columns SiteID, and date and df2 has columns SiteID, Start Date, and End Date. I'm looking to end up with a df that includes SiteID, Date, Start Date and End Date.
example df1:
DateTime SiteID
1 2010-07-25 01:06:55 B04
2 2011-05-10 23:52:14 B04
3 2011-09-17 01:14:30 B04
4 2012-04-04 02:55:29 B05
5 2013-01-05 23:03:06 B05
6 2011-03-09 20:39:46 B06
7 2012-07-25 23:17:19 B07
8 2011-03-03 00:46:45 B08
example df2:
Site.ID Start.date End.date
1 B04 2010-07-18 2010-08-24
2 B04 2011-02-22 2011-07-23
3 B04 2011-08-30 2012-10-03
4 B04 2012-10-20 2013-04-08
5 B05 2011-08-30 2012-08-21
6 B05 2012-12-08 2013-01-21
7 B05 2013-02-08 2013-04-08
8 B06 2010-07-20 2010-09-03
9 B06 2011-02-12 2011-04-18
10 B06 2011-05-13 2011-05-16
11 B07 2011-10-24 2011-11-29
12 B07 2011-12-29 2012-12-02
13 B08 2011-02-12 2011-04-01
14 B08 2011-10-24 2011-12-24
what I'm looking to create:
DateTime SiteID Start.date End.date
1 2010-07-25 01:06:55 B04 2010-07-18 2010-08-24
2 2011-05-10 23:52:14 B04 2011-02-22 2011-07-23
3 2011-09-17 01:14:30 B04 2011-08-30 2012-10-03
4 2012-04-04 02:55:29 B05 2011-08-30 2012-08-21
5 2013-01-05 23:03:06 B05 2012-12-08 2013-01-21
6 2011-03-09 20:39:46 B06 2011-02-12 2011-04-18
7 2012-07-25 23:17:19 B07 2011-12-29 2012-12-02
8 2011-03-03 00:46:45 B08 2011-02-12 2011-04-01
CodePudding user response:
Assume you are using lubridate
for the DateTime
, you can do
df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>%
filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#> DateTime SiteID Start.date End.date
#> <dttm> <chr> <date> <date>
#> 1 2010-07-25 01:06:55 B04 2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04 2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04 2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05 2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05 2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06 2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07 2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08 2011-02-12 2011-04-01
Full Example:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
df1 <- tribble(~DateTime, ~SiteID,
"2010-07-25 01:06:55", "B04",
"2011-05-10 23:52:14", "B04",
"2011-09-17 01:14:30", "B04",
"2012-04-04 02:55:29", "B05",
"2013-01-05 23:03:06", "B05",
"2011-03-09 20:39:46", "B06",
"2012-07-25 23:17:19", "B07",
"2011-03-03 00:46:45", "B08") %>%
mutate(DateTime = as_datetime(DateTime))
df2 <- tribble(~Site.ID, ~Start.date, ~End.date,
"B04", "2010-07-18", "2010-08-24",
"B04", "2011-02-22", "2011-07-23",
"B04", "2011-08-30", "2012-10-03",
"B04", "2012-10-20", "2013-04-08",
"B05", "2011-08-30", "2012-08-21",
"B05", "2012-12-08", "2013-01-21",
"B05", "2013-02-08", "2013-04-08",
"B06", "2010-07-20", "2010-09-03",
"B06", "2011-02-12", "2011-04-18",
"B06", "2011-05-13", "2011-05-16",
"B07", "2011-10-24", "2011-11-29",
"B07", "2011-12-29", "2012-12-02",
"B08", "2011-02-12", "2011-04-01",
"B08", "2011-10-24", "2011-12-24") %>%
mutate(Start.date = as_date(Start.date), End.date = as_date(End.date))
df1 %>% left_join(df2, by = c("SiteID" = "Site.ID")) %>%
filter(DateTime %within% interval(Start.date, End.date))
#> # A tibble: 8 × 4
#> DateTime SiteID Start.date End.date
#> <dttm> <chr> <date> <date>
#> 1 2010-07-25 01:06:55 B04 2010-07-18 2010-08-24
#> 2 2011-05-10 23:52:14 B04 2011-02-22 2011-07-23
#> 3 2011-09-17 01:14:30 B04 2011-08-30 2012-10-03
#> 4 2012-04-04 02:55:29 B05 2011-08-30 2012-08-21
#> 5 2013-01-05 23:03:06 B05 2012-12-08 2013-01-21
#> 6 2011-03-09 20:39:46 B06 2011-02-12 2011-04-18
#> 7 2012-07-25 23:17:19 B07 2011-12-29 2012-12-02
#> 8 2011-03-03 00:46:45 B08 2011-02-12 2011-04-01
Created on 2022-06-09 by the reprex package (v2.0.1)