I have a large yearly data set where each hour contains a value. I am interested in the daily values or values for several days. In my script, I specify the start and end dates of interest and aggregate the data based on the values of each hour in a separate R script ("run_seq.R").
Date_From <- '2022-04-01'
Date_To <- '2022-04-02'
DF_hour<-(DfDHW %>%filter(as.Date(Hour) >= as.Date(Date_From) & as.Date(Hour) <= as.Date(Date_To)))
dummy<-as.numeric(DF_hour$Var*0 1)
dfdate <- aggregate(.~dummy, DFhour, sum)
This works as intended. However, I want to repeat that for several days of the month in loop. What I have done is to create a sequence of "dates from" and "dates to" and a numeric sequence number:
dates_from <- seq(as.Date("2022-04-01"), as.Date("2022-04-10"), by=1)
dates_to <- seq(as.Date("2022-04-02"), as.Date("2022-04-11"), by=1)
seq_num<-as.numeric(format(dates_from, format = "%d"))
df<-cbind.data.frame(dates_from, dates_to, seq_num)
dates_from | dates_to | seq_num |
---|---|---|
2022-04-01 | 2022-04-2 | 1 |
2022-04-02 | 2022-04-3 | 2 |
However, if I run a for() loop, the code will select all start and end dates and aggregate data over all 10 days.
for (x in seq_num) {
source("run_seq.R")
}
Instead, I want to select the start and end date that corresponds to each sequence number (1,2,3...) so that in the first loop start/end of sequence nr 1 is selected, and in the second loop start/end of sequence nr. 2, etc.
Example: Loop 1 = seq num 1: date_from = 2022-04-2; date_to = 2022-04-3; Loop 2 = seq num 2: date_from = 2022-04-3; date_to = 2022-04-4;
The DfDHW data looks like this (37 Variables in total with 8760 rows):
Hour | Var1 | Var2 |
---|---|---|
2022-01-01 01:00:00 | 1.480 | 1.480 |
2022-01-01 02:00:00 | 0.957 | 0.957 |
dput(head(DfDHW) yields:
structure(list(Hour = structure(c(1640995200, 1640998800, 1641002400,
1641006000, 1641009600, 1641013200), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Stockholm"), Var1 = c(1.48022736417965,
0.957129616195086, 0.67616277119973, 0.516807667014335, 0.500124643187317,
0.596748739907164), Var2 = c(1.48022736417965, 0.957129616195086,
0.67616277119973, 0.516807667014335, 0.500124643187317, 0.596748739907164
), Var3 = c(1.48022736417965, 0.957129616195086, ...
How do I achieve this? Or is there a more elegant way altogether?
CodePudding user response:
An approach to filter the data
date_range <- seq(as.Date("2022-01-01"), as.Date("2022-01-04"), "day")
date_range <- data.frame(start = date_range[1:(length(date_range) - 1)],
end = date_range[2:length(date_range)])
date_range
start end
1 2022-01-01 2022-01-02
2 2022-01-02 2022-01-03
3 2022-01-03 2022-01-04
Getting sum
s for days going from "2022-01-01" to "2022-01-02", "2022-01-02" to "2022-01-03" and "2022-01-03" to "2022-01-04" separately.
setNames(
data.frame(apply(date_range, 1, function(x)
colSums(DfDHW[DfDHW$Hour >= x["start"] &
DfDHW$Hour < x["end"], c("Var1", "Var2")]))),
apply(date_range, 1, paste, collapse="_to_"))
2022-01-01_to_2022-01-02 2022-01-02_to_2022-01-03 2022-01-03_to_2022-01-04
Var1 4.727201 0 0
Var2 4.727201 0 0
Data
DfDHW <- structure(list(Hour = structure(c(1640995200, 1640998800, 1641002400,
1641006000, 1641009600, 1641013200), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Stockholm"), Var1 = c(1.48022736417965, 0.957129616195086,
0.67616277119973, 0.516807667014335, 0.500124643187317, 0.596748739907164
), Var2 = c(1.48022736417965, 0.957129616195086, 0.67616277119973,
0.516807667014335, 0.500124643187317, 0.596748739907164)), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
I followed the advice of Andre Wildberg to aggregate before the looping. The suggested filtering did not work for me due to date formats. Instead:
library(dplyr)
Df <- Df2%>%
group_by(date=as.Date(Hour)) %>%
summarise(across(6:221, sum))
This summarizes all hourly values to daily values (sum). Then I set the dates:
Date_From <- '2022-04-01'
Date_To <- '2022-04-04'
And run the background codes in a while loop, and in addition save a plot for each loop.
start <- Date_From
end <- Date_To
Datum <- start
while (Datum <= end)
{
Date_Input=Datum
source("run_seq.R")
print(g)
ggplot2::ggsave(filename = paste0("plot_",Date_Input,".png"),g, path = "Plots", width = 2560, height = 1440, units = "px")
Datum <- as.Date(Datum) 1
}
This seems to work. Maybe a for() loop would be faster, but this way I can choose any dates.