Extract data values at a higher frequency than time stamps-CodePudding

I have continuous behavior data with the timestamp when the subject changed behaviors and what each behavior was, and I need to extract the instantaneous behavior at each minute, starting at the second that the first behavior began: if the behavior started at 17:34:06, I'd define the next minute as 17:35:06. I also have the durations of each behavior calculated. This is what my data looks like:

df <- data.frame(Behavior = c("GRAZ", "MLTC", "GRAZ", "MLTC", "VIGL"),
                 Behavior_Start = c("2022-05-10 17:34:06","2022-05-10 17:38:04","2022-05-10 17:38:26","2022-05-10 17:41:49","2022-05-10 17:42:27"),
                 Behavior_Duration_Minutes = c(0.000000,3.961683,4.325933,7.722067,8.350017))

print(df)

I've used cut() to bin each row into the minute it falls into, but I can't figure out how to get the behavior values for the minutes in which a new behavior doesn't occur (i.e. minutes 2:4 here), and this bases it off the minute but doesn't account for the second that the first behavior began.

time <- data.frame(as.POSIXct(df$Behavior_Start, tz = "America/Denver"))
colnames(time) <- "time"
df <- cbind(df,time)
df.cut <- data.frame(df, cuts = cut(df$time, breaks= "1 min", labels = FALSE))

print(df.cut)

So the dataframe I'd like to end up with would look like this:

new.df <- data.frame(Minute = c(1:10),
                     Timestamp = c("2022-05-10 17:34:06","2022-05-10 17:35:06","2022-05-10 17:36:06","2022-05-10 17:37:06","2022-05-10 17:38:06","2022-05-10 17:39:06","2022-05-10 17:40:06","2022-05-10 17:41:06","2022-05-10 17:42:06","2022-05-10 17:43:06"),   
                     Behavior = c("GRAZ","GRAZ","GRAZ","MLTC","GRAZ","GRAZ","GRAZ","MLTC","VIGL","VIGL"))

print(new.df)

CodePudding user response：

Your data:

library(dplyr)
library(tidyr)
library(purrr)

your_df <- data.frame(
  Behavior = c("Grazing","Vigilant","Grazing","Other","Grazing"),
  Behavior_Start = c("2022-05-10 17:34:06","2022-05-10 17:38:04","2022-05-10 17:38:26","2022-05-10 17:41:49","2022-05-10 17:42:27"),
  Behavior_Duration_Minutes = c(0.000000,3.961683,4.325933,7.722067,8.350017)
)

Using lead() on the duration column gives you the start and end of each "period" of the activity, and then you need to fill in with a minute for each of that duration.

# Make a list column that generates a sequence of minutes "included" in
#   the `Behavior_Duration_Minutes` column. You'll need to play with this
#   logic in terms of whether or not you want `floor()` or `round()` etc.
#   Also update the endpoint, here hardcoded at 10 minutes.
high_res_df <- 
  your_df %>% 
  mutate(
    minutes_covered = purrr::map2(
      ceiling(Behavior_Duration_Minutes), 
      lead(Behavior_Duration_Minutes, default = 10),
      ~seq(.x, .y)
      )
    )
high_res_df
#>   Behavior      Behavior_Start Behavior_Duration_Minutes minutes_covered
#> 1  Grazing 2022-05-10 17:34:06                  0.000000      0, 1, 2, 3
#> 2 Vigilant 2022-05-10 17:38:04                  3.961683               4
#> 3  Grazing 2022-05-10 17:38:26                  4.325933         5, 6, 7
#> 4    Other 2022-05-10 17:41:49                  7.722067               8
#> 5  Grazing 2022-05-10 17:42:27                  8.350017           9, 10

Now that you've generated the list of minutes included, you can use unnest() to get closer to your desired output.

# And here expand out that list-column into a regular sequence
high_res_long <- 
  tidyr::unnest(
    high_res_df,
    "minutes_covered"
  )
high_res_long
#> # A tibble: 11 × 4
#>    Behavior Behavior_Start      Behavior_Duration_Minutes minutes_covered
#>    <chr>    <chr>                                   <dbl>           <int>
#>  1 Grazing  2022-05-10 17:34:06                      0                  0
#>  2 Grazing  2022-05-10 17:34:06                      0                  1
#>  3 Grazing  2022-05-10 17:34:06                      0                  2
#>  4 Grazing  2022-05-10 17:34:06                      0                  3
#>  5 Vigilant 2022-05-10 17:38:04                      3.96               4
#>  6 Grazing  2022-05-10 17:38:26                      4.33               5
#>  7 Grazing  2022-05-10 17:38:26                      4.33               6
#>  8 Grazing  2022-05-10 17:38:26                      4.33               7
#>  9 Other    2022-05-10 17:41:49                      7.72               8
#> 10 Grazing  2022-05-10 17:42:27                      8.35               9
#> 11 Grazing  2022-05-10 17:42:27                      8.35              10

^{Created on 2023-01-13 with reprex v2.0.2}

You'll need to play around with this a bit to match exactly what you want.