This may be a long shot, but I am trying to add values from df1 into df2 under certain conditions. Let me give examples of my data frames:
df1
Date Time TimeSlot Behavior BehaviorTime
10.30 1030 Morning Visitors Startle 142
10.30 1030 Morning Visitors Retreat 155
10.30 1030 Morning Visitors Chase 187
10.31 830 Keeper Feeding Startle 133
10.31 830 Keeper Feeding Chase 139
df2
SessionStart ScanTime Val1 Val2 Temp Weather
10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
10/30/21 10:33:42 10:35:42 B 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:37:42 B 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:38:42 C 70-80 68 Partly Cloudy
10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:38:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:39:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:40:23 C 90-100 77 Sunny
To explain a bit further, df1 is a data frame that is recording behaviors of an animal. The session started at approximately 10:30 AM (1030) on 10/30/21 (10.30). Each time a behavior occurred, the observer would record it. The "BehaviorTime" column is when the observer recorded the behavior (in seconds). So, the first observation of df1 was recorded 142 seconds into the observation session.
The second data frame, df2, is recording the environment of the observation session. Each session was about 30 minutes, and certain values were recorded at each minute (30 observations per session). Some values change, some values stay the same.
I would like to find a way to integrate the "Behavior" column from df1 into df2 during the time that it occurs. The time in df1 is approximate, while the time in df2 is accurate. Like I said, the "BehaviorTime" column represents the number of seconds into the session. So, 142 seconds (first observation in df1) after 10:33:42 (observation start time from df2) would be 10:36:24, which would mean the “Startle” value would be included in the df2 row with 10:36:42 because that time is closest to the calculated time. Essentially, it would look like this:
df3
SessionStart ScanTime Val1 Val2 Temp Weather Behavior
10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
10/30/21 10:33:42 10:35:42 B 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy Startle
10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy Retreat
10/30/21 10:33:42 10:37:42 B 70-80 68 Partly Cloudy Chase
10/30/21 10:33:42 10:38:42 C 70-80 68 Partly Cloudy
10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny Startle
10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny Chase
10/31/21 08:35:23 08:38:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:39:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:40:23 C 90-100 77 Sunny
I really hope this makes sense. I have tried different functions in dplyr, but I honestly don’t even know where to start. Any help is appreciated!
CodePudding user response:
Your desired output seems off. 10:30:00 142 seconds == 10:32:12, not 10:33:42.
here is a data.table
approach using a rolling join
Sample data
library(data.table)
df1 <- fread('Date Time TimeSlot Behavior BehaviorTime
10.30 1030 "Morning Visitors" Startle 142
10.30 1030 "Morning Visitors" Retreat 155
10.30 1030 "Morning Visitors" Chase 187
10.31 830 "Keeper Feeding" Startle 133
10.31 830 "Keeper Feeding" Chase 139',
colClasses = "character")
df2 <- fread('SessionStart ScanTime Val1 Val2 Temp Weather
"10/30/21 10:33:42" 10:34:42 A 60-70 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:35:42 B 70-80 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:36:42 A 70-80 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:37:42 B 70-80 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:38:42 C 70-80 68 "Partly Cloudy"
"10/31/21 08:35:23" 08:36:23 A 40-50 77 Sunny
"10/31/21 08:35:23" 08:37:23 C 90-100 77 Sunny
"10/31/21 08:35:23" 08:38:23 C 90-100 77 Sunny
"10/31/21 08:35:23" 08:39:23 C 90-100 77 Sunny
"10/31/21 08:35:23" 08:40:23 C 90-100 77 Sunny',
colClasses = "character")
code
# Create some actual timestamps to calculate with
df1[, timestamp :=
as.POSIXct(
paste0(Date, ".2021 ", sprintf("d", as.numeric(Time))),
format = "%m.%d.%Y %H%M") as.numeric(BehaviorTime)]
df2[, timestamp :=
as.POSIXct(
paste0(gsub("(.*) .*", "\\1", SessionStart), " ", ScanTime),
format = "%m/%d/%y %H:%M:%S")][]
# Create rowid's for joining later in the process
df2[, rowid := .I][]
# Perform rolling joi to nearest timestamp, then left join
df2[df1, .(timestamp = x.timestamp, Behavior), on = .(timestamp), roll = -Inf][df2, on = .(timestamp)]
# timestamp Behavior SessionStart ScanTime Val1 Val2 Temp Weather
# 1: 2021-10-30 10:34:42 Startle 10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
# 2: 2021-10-30 10:34:42 Retreat 10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
# 3: 2021-10-30 10:34:42 Chase 10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
# 4: 2021-10-30 10:35:42 <NA> 10/30/21 10:33:42 10:35:42 B 70-80 68 Partly Cloudy
# 5: 2021-10-30 10:36:42 <NA> 10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy
# 6: 2021-10-30 10:37:42 <NA> 10/30/21 10:33:42 10:37:42 B 70-80 68 Partly Cloudy
# 7: 2021-10-30 10:38:42 <NA> 10/30/21 10:33:42 10:38:42 C 70-80 68 Partly Cloudy
# 8: 2021-10-31 08:36:23 Startle 10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
# 9: 2021-10-31 08:36:23 Chase 10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
#10: 2021-10-31 08:37:23 <NA> 10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny
#11: 2021-10-31 08:38:23 <NA> 10/31/21 08:35:23 08:38:23 C 90-100 77 Sunny
#12: 2021-10-31 08:39:23 <NA> 10/31/21 08:35:23 08:39:23 C 90-100 77 Sunny
#13: 2021-10-31 08:40:23 <NA> 10/31/21 08:35:23 08:40:23 C 90-100 77 Sunny