Home > Enterprise >  How to calculate values per time interval in r
How to calculate values per time interval in r

Time:09-13

I have a data set with measurements every 5 minutes over 48 hours from different groups, each with several samples (about 32 samples in total). I would like to calculate the mean value and the difference of each sample per hour for the measured values. I have the times as datetime in POSIXct format "YYYY-MM-DD hh:mm:ss" and as hms (running_time) in format "hh:mm:ss". The datetime always has different start times for each sample, the running_time always starts at 00:00:00 and was calculated from the datetime as follows:

group_by(sample)%>%mutate(running_time = `units<-`(date_time - date_time[1], "hours"))

So for each sample I have to calculate the mean value from the start time 00:00:00 to 01:00:00 or the difference between the start time and 01:00:00, then from 01:00:00 to 02:00:00 and so on for the entire 48 hours.

The original data is quite large: ~33000 observations of 46 variables. At the end I like to plot the mean/total change of measurements per hour.

I'm fairly new to r and would appreciate any help!

Little insight into the table (abbreviated):

# A tibble: 33,291 × 5
# Groups:   sample [63]
   date_time           running_time sample    ee  grams_feed
   <dttm>              <time>        <dbl> <dbl> <dbl>
 1 2022-07-02 05:59:00 00'00"         6906 0.447  10.9
 2 2022-07-02 06:04:00 05'00"         6906 0.509  10.9
 3 2022-07-02 06:09:00 10'00"         6906 0.484  11.0
 4 2022-07-02 06:14:00 15'00"         6906 0.425  11.0
 5 2022-07-02 06:19:00 20'00"         6906 0.366  11.0
 6 2022-07-02 06:24:00 25'00"         6906 0.318  11.0
 7 2022-07-02 06:29:00 30'00"         6906 0.26   11.0
 8 2022-07-02 06:34:00 35'00"         6906 0.283  11.0
 9 2022-07-02 06:39:00 40'00"         6906 0.283  11.0
10 2022-07-02 06:44:00 45'00"         6906 0.259  11.0
# … with 33,281 more rows

> dput(head(df))
structure(list(date_time = structure(c(1656741540, 1656741840, 
1656742140, 1656742440, 1656742740, 1656743040), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), running_time = structure(c(0, 300, 600, 900, 1200, 
1500), class = c("hms", "difftime"), units = "secs"), sample = c(6906, 
6906, 6906, 6906, 6906, 6906), ee = c(0.447, 0.509, 0.484, 0.425, 
0.366, 0.318), grams_feed = c(10.91, 10.92, 10.98, 10.98, 10.99, 
10.99)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), groups = structure(list(sample = 6906, 
    .rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

CodePudding user response:

Something like:

df %>%
  mutate(time= lubridate::floor_date(date_time, unit = "hour")) %>%
  group_by(sample, time) %>%
  summarise(
      across(a:j, list(mean = mean, sum = sum, min = min, max = max))
  ) # this will summarize column a to j by mean, sum, min and max

Should do the work ?

  • Related