I have continuous behavior data with the timestamp when the subject changed behaviors and what each behavior was, and I need to extract the instantaneous behavior at each minute, starting at the second that the first behavior began: if the behavior started at 17:34:06, I'd define the next minute as 17:35:06. I also have the durations of each behavior calculated. This is what my data looks like:
df <- data.frame(Behavior = c("GRAZ", "MLTC", "GRAZ", "MLTC", "VIGL"),
Behavior_Start = c("2022-05-10 17:34:06","2022-05-10 17:38:04","2022-05-10 17:38:26","2022-05-10 17:41:49","2022-05-10 17:42:27"),
Behavior_Duration_Minutes = c(0.000000,3.961683,4.325933,7.722067,8.350017))
print(df)
I've used cut() to bin each row into the minute it falls into, but I can't figure out how to get the behavior values for the minutes in which a new behavior doesn't occur (i.e. minutes 2:4 here), and this bases it off the minute but doesn't account for the second that the first behavior began.
time <- data.frame(as.POSIXct(df$Behavior_Start, tz = "America/Denver"))
colnames(time) <- "time"
df <- cbind(df,time)
df.cut <- data.frame(df, cuts = cut(df$time, breaks= "1 min", labels = FALSE))
print(df.cut)
So the dataframe I'd like to end up with would look like this:
new.df <- data.frame(Minute = c(1:10),
Timestamp = c("2022-05-10 17:34:06","2022-05-10 17:35:06","2022-05-10 17:36:06","2022-05-10 17:37:06","2022-05-10 17:38:06","2022-05-10 17:39:06","2022-05-10 17:40:06","2022-05-10 17:41:06","2022-05-10 17:42:06","2022-05-10 17:43:06"),
Behavior = c("GRAZ","GRAZ","GRAZ","MLTC","GRAZ","GRAZ","GRAZ","MLTC","VIGL","VIGL"))
print(new.df)
CodePudding user response:
Your data:
library(dplyr)
library(tidyr)
library(purrr)
your_df <- data.frame(
Behavior = c("Grazing","Vigilant","Grazing","Other","Grazing"),
Behavior_Start = c("2022-05-10 17:34:06","2022-05-10 17:38:04","2022-05-10 17:38:26","2022-05-10 17:41:49","2022-05-10 17:42:27"),
Behavior_Duration_Minutes = c(0.000000,3.961683,4.325933,7.722067,8.350017)
)
Using lead()
on the duration column gives you the start and end of each
"period" of the activity, and then you need to fill in with a minute for each of that duration.
# Make a list column that generates a sequence of minutes "included" in
# the `Behavior_Duration_Minutes` column. You'll need to play with this
# logic in terms of whether or not you want `floor()` or `round()` etc.
# Also update the endpoint, here hardcoded at 10 minutes.
high_res_df <-
your_df %>%
mutate(
minutes_covered = purrr::map2(
ceiling(Behavior_Duration_Minutes),
lead(Behavior_Duration_Minutes, default = 10),
~seq(.x, .y)
)
)
high_res_df
#> Behavior Behavior_Start Behavior_Duration_Minutes minutes_covered
#> 1 Grazing 2022-05-10 17:34:06 0.000000 0, 1, 2, 3
#> 2 Vigilant 2022-05-10 17:38:04 3.961683 4
#> 3 Grazing 2022-05-10 17:38:26 4.325933 5, 6, 7
#> 4 Other 2022-05-10 17:41:49 7.722067 8
#> 5 Grazing 2022-05-10 17:42:27 8.350017 9, 10
Now that you've generated the list of minutes included, you can use unnest()
to get closer to your desired output.
# And here expand out that list-column into a regular sequence
high_res_long <-
tidyr::unnest(
high_res_df,
"minutes_covered"
)
high_res_long
#> # A tibble: 11 × 4
#> Behavior Behavior_Start Behavior_Duration_Minutes minutes_covered
#> <chr> <chr> <dbl> <int>
#> 1 Grazing 2022-05-10 17:34:06 0 0
#> 2 Grazing 2022-05-10 17:34:06 0 1
#> 3 Grazing 2022-05-10 17:34:06 0 2
#> 4 Grazing 2022-05-10 17:34:06 0 3
#> 5 Vigilant 2022-05-10 17:38:04 3.96 4
#> 6 Grazing 2022-05-10 17:38:26 4.33 5
#> 7 Grazing 2022-05-10 17:38:26 4.33 6
#> 8 Grazing 2022-05-10 17:38:26 4.33 7
#> 9 Other 2022-05-10 17:41:49 7.72 8
#> 10 Grazing 2022-05-10 17:42:27 8.35 9
#> 11 Grazing 2022-05-10 17:42:27 8.35 10
Created on 2023-01-13 with reprex v2.0.2
You'll need to play around with this a bit to match exactly what you want.