Im strugling to aggregate hourly temperatures into 3-hourly while keeping the station ID.Here is the df:
ID | Date | temp |
---|---|---|
1155 | 2012-01-01 00:00:00 | -0.8 |
1155 | 2012-01-01 01:00:00 | 0.1 |
1155 | 2012-01-01 02:00:00 | 0.5 |
and Im striving to get smth like:
ID | Date | temp |
---|---|---|
1155 | 2012-01-01 | -0.2 |
Ive elaborated this code:
library(dplyr)
Temp_3h<- df %>%
group_by(ID)%>%
aggregate(.,by=list(Date=cut(as.POSIXct(df$Date), "3 hour")),mean)
but beside the "temp" variable it also tend to aggregate IDs (categorical), so they become NAs. And I dont know how to integrate ID into "by=" argument. Any help would be appreciated
CodePudding user response:
You may use floor_date
/ceiling_date
to combine timestamp every 3 hours into one and take average of temp
values for each ID
.
library(dplyr)
library(lubridate)
Temp_3h <- df %>%
group_by(ID, Date = floor_date(ymd_hms(Date), '3 hours')) %>%
summarise(temp = mean(temp, na.rm = TRUE), .groups = 'drop')
Temp_3h
CodePudding user response:
I actually like the cut
approach.
d |>
transform(date_s=cut(as.POSIXct(d$Date), breaks="3 hours")) |>
with(aggregate(list(mn_temp=temp), list(date=date_s, ID=ID), FUN=mean))
# date ID mn_temp
# 1 2012-01-01 00:00:00 1155 -0.06666667
# 2 2012-01-01 03:00:00 1155 0.56666667
# 3 2012-01-01 06:00:00 1155 0.93333333
# 4 2012-01-01 09:00:00 1155 3.70000000
If instead of the start time we rather want to display the end of the time interval, we could do
d |>
transform(date_s=cut(
as.POSIXct(d$Date), breaks="3 hours",
labels=(as.POSIXct(Date) 10800)[(seq(Date) - 1) %% 3 == 0])) |>
with(aggregate(list(mn_temp_lst3=temp), list(date=date_s, ID=ID), FUN=mean))
# date ID mn_temp_lst3
# 1 2012-01-01 03:00:00 1155 -0.06666667
# 2 2012-01-01 06:00:00 1155 0.56666667
# 3 2012-01-01 09:00:00 1155 0.93333333
# 4 2012-01-01 12:00:00 1155 3.70000000
Data
d <- structure(list(ID = c(1155L, 1155L, 1155L, 1155L, 1155L, 1155L,
1155L, 1155L, 1155L, 1155L), Date = c("2012-01-01 00:00:00",
"2012-01-01 01:00:00", "2012-01-01 02:00:00", "2012-01-01 03:00:00",
"2012-01-01 04:00:00", "2012-01-01 05:00:00", "2012-01-01 06:00:00",
"2012-01-01 07:00:00", "2012-01-01 08:00:00", "2012-01-01 09:00:00"
), temp = c(-0.8, 0.1, 0.5, 0.6, 0.6, 0.5, 0.7, 0.9, 1.2, 3.7
)), row.names = c(NA, -10L), class = "data.frame")
CodePudding user response:
You could floor the dates and use the group_by
and summarize
functions:
library(lubridate)
library(dplyr)
library(plyr)
summarise(group_by(df, ID, Date = floor_date(ymd_hms(Date), '3 hours')), first(Date), first(ID), sum(temp))
Output:
first(Date) first(ID) sum(temp)
1 2012-01-01 1155 -0.2