Home > Software design >  Running loop with aggregated data based on dates
Running loop with aggregated data based on dates

Time:12-30

I have a large yearly data set where each hour contains a value. I am interested in the daily values or values for several days. In my script, I specify the start and end dates of interest and aggregate the data based on the values of each hour in a separate R script ("run_seq.R").

Date_From <- '2022-04-01'
Date_To <- '2022-04-02'
DF_hour<-(DfDHW %>%filter(as.Date(Hour) >= as.Date(Date_From) & as.Date(Hour) <= as.Date(Date_To)))
dummy<-as.numeric(DF_hour$Var*0 1)
dfdate <- aggregate(.~dummy, DFhour, sum)

This works as intended. However, I want to repeat that for several days of the month in loop. What I have done is to create a sequence of "dates from" and "dates to" and a numeric sequence number:

dates_from <- seq(as.Date("2022-04-01"), as.Date("2022-04-10"), by=1)
dates_to <- seq(as.Date("2022-04-02"), as.Date("2022-04-11"), by=1)
seq_num<-as.numeric(format(dates_from, format = "%d"))
df<-cbind.data.frame(dates_from, dates_to, seq_num)
dates_from dates_to seq_num
2022-04-01 2022-04-2 1
2022-04-02 2022-04-3 2

However, if I run a for() loop, the code will select all start and end dates and aggregate data over all 10 days.

for (x in seq_num) {
  source("run_seq.R") 
}

Instead, I want to select the start and end date that corresponds to each sequence number (1,2,3...) so that in the first loop start/end of sequence nr 1 is selected, and in the second loop start/end of sequence nr. 2, etc.

Example: Loop 1 = seq num 1: date_from = 2022-04-2; date_to = 2022-04-3; Loop 2 = seq num 2: date_from = 2022-04-3; date_to = 2022-04-4;

The DfDHW data looks like this (37 Variables in total with 8760 rows):

Hour Var1 Var2
2022-01-01 01:00:00 1.480 1.480
2022-01-01 02:00:00 0.957 0.957

dput(head(DfDHW) yields:

structure(list(Hour = structure(c(1640995200, 1640998800, 1641002400, 
1641006000, 1641009600, 1641013200), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Stockholm"), Var1 = c(1.48022736417965, 
0.957129616195086, 0.67616277119973, 0.516807667014335, 0.500124643187317, 
0.596748739907164), Var2 = c(1.48022736417965, 0.957129616195086, 
0.67616277119973, 0.516807667014335, 0.500124643187317, 0.596748739907164
), Var3 = c(1.48022736417965, 0.957129616195086, ...

How do I achieve this? Or is there a more elegant way altogether?

CodePudding user response:

An approach to filter the data

date_range <- seq(as.Date("2022-01-01"), as.Date("2022-01-04"), "day")

date_range <- data.frame(start = date_range[1:(length(date_range) - 1)], 
  end = date_range[2:length(date_range)])

date_range
       start        end
1 2022-01-01 2022-01-02
2 2022-01-02 2022-01-03
3 2022-01-03 2022-01-04

Getting sums for days going from "2022-01-01" to "2022-01-02", "2022-01-02" to "2022-01-03" and "2022-01-03" to "2022-01-04" separately.

setNames(
  data.frame(apply(date_range, 1, function(x) 
    colSums(DfDHW[DfDHW$Hour >= x["start"] & 
      DfDHW$Hour < x["end"], c("Var1", "Var2")]))), 
  apply(date_range, 1, paste, collapse="_to_"))
     2022-01-01_to_2022-01-02 2022-01-02_to_2022-01-03 2022-01-03_to_2022-01-04
Var1                 4.727201                        0                        0
Var2                 4.727201                        0                        0

Data

DfDHW <- structure(list(Hour = structure(c(1640995200, 1640998800, 1641002400, 
1641006000, 1641009600, 1641013200), class = c("POSIXct", "POSIXt"
), tzone = "Europe/Stockholm"), Var1 = c(1.48022736417965, 0.957129616195086, 
0.67616277119973, 0.516807667014335, 0.500124643187317, 0.596748739907164
), Var2 = c(1.48022736417965, 0.957129616195086, 0.67616277119973, 
0.516807667014335, 0.500124643187317, 0.596748739907164)), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

I followed the advice of Andre Wildberg to aggregate before the looping. The suggested filtering did not work for me due to date formats. Instead:

library(dplyr)
Df <- Df2%>% 
  group_by(date=as.Date(Hour)) %>%
  summarise(across(6:221, sum))

This summarizes all hourly values to daily values (sum). Then I set the dates:

Date_From <- '2022-04-01'
Date_To <- '2022-04-04' 

And run the background codes in a while loop, and in addition save a plot for each loop.

start <- Date_From
end <- Date_To
Datum <- start
while (Datum <= end)
{
  Date_Input=Datum
  source("run_seq.R") 
  print(g)
  ggplot2::ggsave(filename = paste0("plot_",Date_Input,".png"),g, path = "Plots", width = 2560, height = 1440, units = "px")
  Datum <- as.Date(Datum)   1                  
}

This seems to work. Maybe a for() loop would be faster, but this way I can choose any dates.

  • Related