I have two dataframes. The first dataframe Observations
shows the date/times that a surveyor recorded events of interest, and a unique ID
number for each type of event:
library(lubridate)
Observations <- data.frame(Time = dmy_hms(paste(c("13-7-2022 10:01:01","13-7-2022 14:02:01","15-7-2022 10:01:01", "15-7-2022 16:01:01"))), ID = c(1,3,1))
The second dataframe Sites
shows what date/time the surveyor started and stopped looking for events (i.e., this describes the possible times that events could have been observed). TimeStart
is the time the surveyor began looking for an event, TimeEnd
is when they stopped. Sites
also contains the latitude and longitude where the surveyor was looking for events between TimeStart
and TimeEnd
.
Sites <- data.frame(TimeStart = dmy_hms(paste(c("13-7-2022 10:00:00","13-7-2022 14:00:00","15-7-2022 10:00:00", "15-7-2022 16:00:00"))),
TimeEnd = dmy_hms(paste(c("13-7-2022 10:05:00","13-7-2022 14:05:00","15-7-2022 10:05:00", "15-7-2022 16:05:00"))),
Latitude = c("11.1111", "11.2222", "11.1234", "11.1487"),
Longitude = c("99.1257", "99.3478", "99.6241", "99.6214"))
So the Time
that events are recorded by the surveyor (i.e., recorded in Observations
) falls within one of the time ranges shown in Sites$TimeStart
and Sites$TimeEnd
.
I would like to merge these two dataframes so that rows for each event (ID
) recorded in Observations
contains the Latitude
and Longitude
where the surveyor was searching during the corresponding Time
, as well as when they started TimeStart
and stopped TimeEnd
searching for each period.
In the end, Observations
would look like this:
Time ID Latitude Longitude TimeStart TimeEnd
2022-07-13 10:01:01 1 11.1111 99.1257 2022-07-13 10:00:00 2022-07-13 10:05:00
2022-07-13 14:02:01 3 11.2222 99.3478 2022-07-13 14:00:00 2022-07-13 14:05:00
2022-07-15 10:01:01 2 11.1234 99.6241 2022-07-15 10:00:00 2022-07-15 10:05:00
2022-07-15 16:01:01 1 11.1487 99.6214 2022-07-15 16:00:00 2022-07-15 16:05:00
How can we merge this data by times when Observations$Time
falls within a "range of times" shown in Sites$TimeStart
and Sites$TimeEnd
?
CodePudding user response:
We can do this by retrieving the index (row number) in Sites
that fulfills the time condition:
Observations$siteindex <- sapply(Observations$Time, function(x) which(x<=Sites$TimeEnd&x>=Sites$TimeStart)[1]) # first matching row into Sites
Sites$siteindex <- 1:nrow(Sites)
result <- merge(Observations, Sites, by="siteindex")
siteindex Time ID TimeStart TimeEnd Latitude Longitude
1 1 2022-07-13 10:01:01 1 2022-07-13 10:00:00 2022-07-13 10:05:00 11.1111 99.1257
2 2 2022-07-13 14:02:01 3 2022-07-13 14:00:00 2022-07-13 14:05:00 11.2222 99.3478
3 3 2022-07-15 10:01:01 1 2022-07-15 10:00:00 2022-07-15 10:05:00 11.1234 99.6241
4 4 2022-07-15 16:01:01 2 2022-07-15 16:00:00 2022-07-15 16:05:00 11.1487 99.6214
The data might be a bit different because your data.frame
is incorrect: arguments imply differing number of rows: 4, 3