Add values into a data frame under conditions of other data frame-CodePudding

This may be a long shot, but I am trying to add values from df1 into df2 under certain conditions. Let me give examples of my data frames:

df1
Date   Time  TimeSlot          Behavior  BehaviorTime
10.30  1030  Morning Visitors  Startle   142
10.30  1030  Morning Visitors  Retreat   155
10.30  1030  Morning Visitors  Chase     187
10.31  830   Keeper Feeding    Startle   133
10.31  830   Keeper Feeding    Chase     139

df2
SessionStart       ScanTime  Val1  Val2    Temp  Weather
10/30/21 10:33:42  10:34:42  A     60-70   68    Partly Cloudy
10/30/21 10:33:42  10:35:42  B     70-80   68    Partly Cloudy
10/30/21 10:33:42  10:36:42  A     70-80   68    Partly Cloudy
10/30/21 10:33:42  10:37:42  B     70-80   68    Partly Cloudy
10/30/21 10:33:42  10:38:42  C     70-80   68    Partly Cloudy
10/31/21 08:35:23  08:36:23  A     40-50   77    Sunny
10/31/21 08:35:23  08:37:23  C     90-100  77    Sunny
10/31/21 08:35:23  08:38:23  C     90-100  77    Sunny
10/31/21 08:35:23  08:39:23  C     90-100  77    Sunny
10/31/21 08:35:23  08:40:23  C     90-100  77    Sunny

To explain a bit further, df1 is a data frame that is recording behaviors of an animal. The session started at approximately 10:30 AM (1030) on 10/30/21 (10.30). Each time a behavior occurred, the observer would record it. The "BehaviorTime" column is when the observer recorded the behavior (in seconds). So, the first observation of df1 was recorded 142 seconds into the observation session.

The second data frame, df2, is recording the environment of the observation session. Each session was about 30 minutes, and certain values were recorded at each minute (30 observations per session). Some values change, some values stay the same.

I would like to find a way to integrate the "Behavior" column from df1 into df2 during the time that it occurs. The time in df1 is approximate, while the time in df2 is accurate. Like I said, the "BehaviorTime" column represents the number of seconds into the session. So, 142 seconds (first observation in df1) after 10:33:42 (observation start time from df2) would be 10:36:24, which would mean the “Startle” value would be included in the df2 row with 10:36:42 because that time is closest to the calculated time. Essentially, it would look like this:

df3
SessionStart       ScanTime  Val1  Val2    Temp  Weather        Behavior
10/30/21 10:33:42  10:34:42  A     60-70   68    Partly Cloudy
10/30/21 10:33:42  10:35:42  B     70-80   68    Partly Cloudy
10/30/21 10:33:42  10:36:42  A     70-80   68    Partly Cloudy  Startle
10/30/21 10:33:42  10:36:42  A     70-80   68    Partly Cloudy  Retreat
10/30/21 10:33:42  10:37:42  B     70-80   68    Partly Cloudy  Chase
10/30/21 10:33:42  10:38:42  C     70-80   68    Partly Cloudy
10/31/21 08:35:23  08:36:23  A     40-50   77    Sunny
10/31/21 08:35:23  08:37:23  C     90-100  77    Sunny          Startle
10/31/21 08:35:23  08:37:23  C     90-100  77    Sunny          Chase
10/31/21 08:35:23  08:38:23  C     90-100  77    Sunny
10/31/21 08:35:23  08:39:23  C     90-100  77    Sunny
10/31/21 08:35:23  08:40:23  C     90-100  77    Sunny

I really hope this makes sense. I have tried different functions in dplyr, but I honestly don’t even know where to start. Any help is appreciated!

CodePudding user response：

Your desired output seems off. 10:30:00 142 seconds == 10:32:12, not 10:33:42.

here is a data.table approach using a rolling join

Sample data

library(data.table)
df1 <- fread('Date   Time  TimeSlot          Behavior  BehaviorTime
             10.30  1030  "Morning Visitors"  Startle   142
             10.30  1030  "Morning Visitors"  Retreat   155
             10.30  1030  "Morning Visitors"  Chase     187
             10.31  830   "Keeper Feeding"    Startle   133
             10.31  830   "Keeper Feeding"    Chase     139', 
             colClasses = "character")
             
df2 <- fread('SessionStart       ScanTime  Val1  Val2    Temp  Weather
             "10/30/21 10:33:42"  10:34:42  A     60-70   68    "Partly Cloudy"
             "10/30/21 10:33:42"  10:35:42  B     70-80   68    "Partly Cloudy"
             "10/30/21 10:33:42"  10:36:42  A     70-80   68    "Partly Cloudy"
             "10/30/21 10:33:42"  10:37:42  B     70-80   68    "Partly Cloudy"
             "10/30/21 10:33:42"  10:38:42  C     70-80   68    "Partly Cloudy"
             "10/31/21 08:35:23"  08:36:23  A     40-50   77    Sunny
             "10/31/21 08:35:23"  08:37:23  C     90-100  77    Sunny
             "10/31/21 08:35:23"  08:38:23  C     90-100  77    Sunny
             "10/31/21 08:35:23"  08:39:23  C     90-100  77    Sunny
             "10/31/21 08:35:23"  08:40:23  C     90-100  77    Sunny',
             colClasses = "character")

code

# Create some actual timestamps to calculate with
df1[, timestamp := 
      as.POSIXct(
        paste0(Date, ".2021 ", sprintf("d", as.numeric(Time))), 
        format = "%m.%d.%Y %H%M")   as.numeric(BehaviorTime)]
df2[, timestamp := 
      as.POSIXct(
        paste0(gsub("(.*) .*", "\\1", SessionStart), " ", ScanTime),
        format = "%m/%d/%y %H:%M:%S")][]
# Create rowid's for joining later in the process
df2[, rowid := .I][]
# Perform rolling joi to nearest timestamp, then left join
df2[df1, .(timestamp = x.timestamp, Behavior), on = .(timestamp), roll = -Inf][df2, on = .(timestamp)]

#              timestamp Behavior      SessionStart ScanTime Val1   Val2 Temp       Weather
# 1: 2021-10-30 10:34:42  Startle 10/30/21 10:33:42 10:34:42    A  60-70   68 Partly Cloudy
# 2: 2021-10-30 10:34:42  Retreat 10/30/21 10:33:42 10:34:42    A  60-70   68 Partly Cloudy
# 3: 2021-10-30 10:34:42    Chase 10/30/21 10:33:42 10:34:42    A  60-70   68 Partly Cloudy
# 4: 2021-10-30 10:35:42     <NA> 10/30/21 10:33:42 10:35:42    B  70-80   68 Partly Cloudy
# 5: 2021-10-30 10:36:42     <NA> 10/30/21 10:33:42 10:36:42    A  70-80   68 Partly Cloudy
# 6: 2021-10-30 10:37:42     <NA> 10/30/21 10:33:42 10:37:42    B  70-80   68 Partly Cloudy
# 7: 2021-10-30 10:38:42     <NA> 10/30/21 10:33:42 10:38:42    C  70-80   68 Partly Cloudy
# 8: 2021-10-31 08:36:23  Startle 10/31/21 08:35:23 08:36:23    A  40-50   77         Sunny
# 9: 2021-10-31 08:36:23    Chase 10/31/21 08:35:23 08:36:23    A  40-50   77         Sunny
#10: 2021-10-31 08:37:23     <NA> 10/31/21 08:35:23 08:37:23    C 90-100   77         Sunny
#11: 2021-10-31 08:38:23     <NA> 10/31/21 08:35:23 08:38:23    C 90-100   77         Sunny
#12: 2021-10-31 08:39:23     <NA> 10/31/21 08:35:23 08:39:23    C 90-100   77         Sunny
#13: 2021-10-31 08:40:23     <NA> 10/31/21 08:35:23 08:40:23    C 90-100   77         Sunny