Home > Net >  Merging dataframes by ranges of datetimes in R
Merging dataframes by ranges of datetimes in R

Time:07-18

I have two dataframes. The first dataframe Observations shows the date/times that a surveyor recorded events of interest, and a unique ID number for each type of event:

library(lubridate)
Observations <- data.frame(Time = dmy_hms(paste(c("13-7-2022 10:01:01","13-7-2022 14:02:01","15-7-2022 10:01:01", "15-7-2022 16:01:01"))), ID = c(1,3,1))

The second dataframe Sites shows what date/time the surveyor started and stopped looking for events (i.e., this describes the possible times that events could have been observed). TimeStart is the time the surveyor began looking for an event, TimeEnd is when they stopped. Sites also contains the latitude and longitude where the surveyor was looking for events between TimeStart and TimeEnd.

Sites <- data.frame(TimeStart = dmy_hms(paste(c("13-7-2022 10:00:00","13-7-2022 14:00:00","15-7-2022 10:00:00", "15-7-2022 16:00:00"))),
                    TimeEnd = dmy_hms(paste(c("13-7-2022 10:05:00","13-7-2022 14:05:00","15-7-2022 10:05:00", "15-7-2022 16:05:00"))),
                    Latitude = c("11.1111", "11.2222", "11.1234", "11.1487"),
                    Longitude = c("99.1257", "99.3478", "99.6241", "99.6214"))

So the Time that events are recorded by the surveyor (i.e., recorded in Observations) falls within one of the time ranges shown in Sites$TimeStart and Sites$TimeEnd.

I would like to merge these two dataframes so that rows for each event (ID) recorded in Observations contains the Latitude and Longitude where the surveyor was searching during the corresponding Time, as well as when they started TimeStart and stopped TimeEnd searching for each period.

In the end, Observations would look like this:

Time                ID  Latitude Longitude  TimeStart              TimeEnd    
2022-07-13 10:01:01 1   11.1111   99.1257   2022-07-13 10:00:00   2022-07-13 10:05:00
2022-07-13 14:02:01 3   11.2222   99.3478   2022-07-13 14:00:00   2022-07-13 14:05:00
2022-07-15 10:01:01 2   11.1234   99.6241   2022-07-15 10:00:00   2022-07-15 10:05:00
2022-07-15 16:01:01 1   11.1487   99.6214   2022-07-15 16:00:00   2022-07-15 16:05:00

How can we merge this data by times when Observations$Time falls within a "range of times" shown in Sites$TimeStart and Sites$TimeEnd?

CodePudding user response:

We can do this by retrieving the index (row number) in Sites that fulfills the time condition:

Observations$siteindex <- sapply(Observations$Time, function(x) which(x<=Sites$TimeEnd&x>=Sites$TimeStart)[1]) # first matching row into Sites
Sites$siteindex <- 1:nrow(Sites)
result <- merge(Observations, Sites, by="siteindex")
  siteindex                Time ID           TimeStart             TimeEnd Latitude Longitude
1         1 2022-07-13 10:01:01  1 2022-07-13 10:00:00 2022-07-13 10:05:00  11.1111   99.1257
2         2 2022-07-13 14:02:01  3 2022-07-13 14:00:00 2022-07-13 14:05:00  11.2222   99.3478
3         3 2022-07-15 10:01:01  1 2022-07-15 10:00:00 2022-07-15 10:05:00  11.1234   99.6241
4         4 2022-07-15 16:01:01  2 2022-07-15 16:00:00 2022-07-15 16:05:00  11.1487   99.6214

The data might be a bit different because your data.frame is incorrect: arguments imply differing number of rows: 4, 3

  • Related