Calculate average of values 30 seconds prior to a specified condition in R-CodePudding

I have a series of dive data for a seal. I'd like to calculate the heart rate of the seal during the 30 seconds prior to the dive starting. However I'm not sure how to tell R that if it comes across a dive start, to go back 30 seconds from that start time and calculate the average heart rate. I'm assuming it will need to be a loop, where the condition is if the row contains a dive start, then go back and average the last 30 seconds before that start.

I'm really hoping this is possible, I'm having trouble envisioning any other way to calculate this information with my data.

My data looks like this with NA dive information before a dive begins, then a dive start time where the dive data begins. The dataframe has multiple dives in it, between dives there could be minutes of time where they are hanging at the surface where the data is NA. But I am only interested in calculating the heart rate is the 30 seconds prior to the dive starting, not the full time between dives. Note: I tend to use epoch time when calculating things with time because it's just easier to communicate with R that way.

          HR      epoch            datetime diveNum dive_start
1  103.44828 1523026041 2018-04-06 14:47:21      NA         NA
2   82.19178 1523026041 2018-04-06 14:47:21      NA         NA
3   88.23529 1523026042 2018-04-06 14:47:22      NA         NA
4   95.23810 1523026043 2018-04-06 14:47:23      NA         NA
5   90.90909 1523026043 2018-04-06 14:47:23      NA         NA
6   88.23529 1523026044 2018-04-06 14:47:24      NA         NA
7   84.50704 1523026045 2018-04-06 14:47:25      NA         NA
8   84.50704 1523026045 2018-04-06 14:47:25      NA         NA
9   82.19178 1523026046 2018-04-06 14:47:26      NA         NA
10  80.00000 1523026047 2018-04-06 14:47:27      NA         NA
11  80.00000 1523026047 2018-04-06 14:47:27      NA         NA
12  81.08108 1523026048 2018-04-06 14:47:28      NA         NA
13  80.00000 1523026049 2018-04-06 14:47:29      NA         NA
14  78.94737 1523026050 2018-04-06 14:47:30      NA         NA
15  68.18182 1523026050 2018-04-06 14:47:30      NA         NA
16  60.00000 1523026051 2018-04-06 14:47:31      NA         NA
17  40.26846 1523026052 2018-04-06 14:47:32      NA         NA
18  49.18033 1523026054 2018-04-06 14:47:34      NA         NA
19  48.00000 1523026055 2018-04-06 14:47:35      NA         NA
20  48.38710 1523026056 2018-04-06 14:47:36      NA         NA
21  48.00000 1523026058 2018-04-06 14:47:38      NA         NA
22  49.18033 1523026059 2018-04-06 14:47:39      NA         NA
23  50.84746 1523026060 2018-04-06 14:47:40      NA         NA
24  52.17391 1523026061 2018-04-06 14:47:41      NA         NA
25  44.44444 1523026062 2018-04-06 14:47:42      NA         NA
26  47.61905 1523026064 2018-04-06 14:47:44      NA         NA
27  44.77612 1523026065 2018-04-06 14:47:45      NA         NA
28  43.79562 1523026066 2018-04-06 14:47:46      NA         NA
29  34.88372 1523026068 2018-04-06 14:47:48      NA         NA
30  36.58537 1523026069 2018-04-06 14:47:49      NA         NA
31  39.73510 1523026071 2018-04-06 14:47:51       1 1523026071

So I'd like R to average these prior 30 seconds of heart rate data (HR) once it comes across a dive start time. Ideally I would just make a new column in the row of the dive start time that is the predive HR.

I've tried to run a loop as seen below, but I just get NaN values. I got this loop from another post actually, but it doesn't seem to work in my case. I try to tell R that when the epoch time is equal to or greater than the time 30 seconds prior to dive start, or is less than or equal to the dive start time, to average the HR over that time. But my code is not putting across to R that it needs to go back in time basically to calculate that average.

#setting up the column
df$prediveHR<-NA

# for each row
for (r in 1:nrow(df)) {
  # find all the rows within the acceptable timespan
  # note: figure out if you want < vs <=
  thisSubset = df[
    df$epoch[r] >= df$predivetime[r] &  
    df$epoch[r] <= df$dive_start[r]
  ,]
  # get the mean of the subset
  df$prediveHR[r] = mean(thisSubset$HR)
}

CodePudding user response：

I asume that each row is one second. It seems so from the data you provided.

results <- c()
for (i in 1:nrow(df){
  if (df$dive_start[i] != is.na(T){
    results[i] <- mean(df$HR[(i-29):i]
  }
}

I ask for each row if column dive_start is NOT NA. If it is not NA then we proceed to write mean of df$HR into results vector.

I take advantage of the fact, that in each iteration i is equal to the number of current row. Since we know number of row we are on we can simply substract 29 rows for our starting number (in your example table it would be 2) and then use i as the end row for the mean() function.

You can easily adjust the length of period this way.

CodePudding user response：

Here is a {tidyverse}-based solution. In a nutshell:

use tidyr::fill() to propagate dive number and start time back into the pre-dive periods, and use this to compute seconds to dive start
filter to pre-dive rows within 30 seconds of dive start
compute average pre-dive HR within each diveNum
merge the computed pre-dive HRs back in to the main dataframe

library(tidyverse)

pre_dive_data <- dive_data %>% 
  mutate(preDive = is.na(diveNum)) %>%
  fill(diveNum, dive_start, .direction = "up") %>%
  filter(preDive, dive_start - epoch <= 30) %>% 
  group_by(diveNum) %>%
  summarize(preDiveHR = mean(HR))

dive_data <- dive_data %>%
  left_join(pre_dive_data)

tail(dive_data)

Output:

# A tibble: 6 x 6
     HR      epoch datetime            diveNum dive_start preDiveHR
  <dbl>      <dbl> <dttm>                <dbl>      <dbl>     <dbl>
1  47.6 1523026064 2018-04-06 14:47:44      NA         NA      NA  
2  44.8 1523026065 2018-04-06 14:47:45      NA         NA      NA  
3  43.8 1523026066 2018-04-06 14:47:46      NA         NA      NA  
4  34.9 1523026068 2018-04-06 14:47:48      NA         NA      NA  
5  36.6 1523026069 2018-04-06 14:47:49      NA         NA      NA  
6  39.7 1523026071 2018-04-06 14:47:51       1 1523026071      65.5