I have a data set with measurements every 5 minutes over 48 hours from different groups, each with several samples (about 32 samples in total). I would like to calculate the mean value and the difference of each sample per hour for the measured values. I have the times as datetime in POSIXct format "YYYY-MM-DD hh:mm:ss" and as hms (running_time) in format "hh:mm:ss". The datetime always has different start times for each sample, the running_time always starts at 00:00:00 and was calculated from the datetime as follows:
group_by(sample)%>%mutate(running_time = `units<-`(date_time - date_time[1], "hours"))
So for each sample I have to calculate the mean value from the start time 00:00:00 to 01:00:00 or the difference between the start time and 01:00:00, then from 01:00:00 to 02:00:00 and so on for the entire 48 hours.
The original data is quite large: ~33000 observations of 46 variables. At the end I like to plot the mean/total change of measurements per hour.
I'm fairly new to r and would appreciate any help!
Little insight into the table (abbreviated):
# A tibble: 33,291 × 5
# Groups: sample [63]
date_time running_time sample ee grams_feed
<dttm> <time> <dbl> <dbl> <dbl>
1 2022-07-02 05:59:00 00'00" 6906 0.447 10.9
2 2022-07-02 06:04:00 05'00" 6906 0.509 10.9
3 2022-07-02 06:09:00 10'00" 6906 0.484 11.0
4 2022-07-02 06:14:00 15'00" 6906 0.425 11.0
5 2022-07-02 06:19:00 20'00" 6906 0.366 11.0
6 2022-07-02 06:24:00 25'00" 6906 0.318 11.0
7 2022-07-02 06:29:00 30'00" 6906 0.26 11.0
8 2022-07-02 06:34:00 35'00" 6906 0.283 11.0
9 2022-07-02 06:39:00 40'00" 6906 0.283 11.0
10 2022-07-02 06:44:00 45'00" 6906 0.259 11.0
# … with 33,281 more rows
> dput(head(df))
structure(list(date_time = structure(c(1656741540, 1656741840,
1656742140, 1656742440, 1656742740, 1656743040), tzone = "UTC", class = c("POSIXct",
"POSIXt")), running_time = structure(c(0, 300, 600, 900, 1200,
1500), class = c("hms", "difftime"), units = "secs"), sample = c(6906,
6906, 6906, 6906, 6906, 6906), ee = c(0.447, 0.509, 0.484, 0.425,
0.366, 0.318), grams_feed = c(10.91, 10.92, 10.98, 10.98, 10.99,
10.99)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), groups = structure(list(sample = 6906,
.rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))
CodePudding user response:
Something like:
df %>%
mutate(time= lubridate::floor_date(date_time, unit = "hour")) %>%
group_by(sample, time) %>%
summarise(
across(a:j, list(mean = mean, sum = sum, min = min, max = max))
) # this will summarize column a to j by mean, sum, min and max
Should do the work ?