Home > Software engineering >  How can I calculate mean values for each day of an year from a time series data set in R?
How can I calculate mean values for each day of an year from a time series data set in R?

Time:10-18

I have a data set containing climatic data taken hourly from 01-01-2007 to 31-12-2021. I want to calculate the mean value for a given variable (e.g. temperature) for each day of the year (1:365).

My dataset look something like this:

   dia        prec_h  tc_h  um_h   v_d  vm_h
   <date>      <dbl> <dbl> <dbl> <dbl> <dbl>
 1 2007-01-01    0.2  22.9    89    42   3  
 2 2007-01-01    0.4  22.8    93    47   1.9
 3 2007-01-01    0    22.7    94    37   1.3
 4 2007-01-01    0    22.6    94    38   1.6
 5 2007-01-01    0    22.7    95    46   2.3
[...]
 131496 2021-12-31 0.0 24.7   87    47   2.6

( "[...]" stands for sequence of data from 2007 - 2014).

I first calculated daily mean temperature for each of my entry dates as follows:

md$dia<-as.Date(md$dia,format = "%d/%m/%Y")
m_tc<-aggregate(tc_h ~ dia, md, mean)

This returned me a data frame with mean temperature values for each analyzed year.

Now, I want to calculate the mean temperature for each day of the year from this data frame, i.e: mean temperature for January 1st up to December 31st. Thus, I need to end up with a data frame with 365 rows, but I don't know how to do such calculation. Can anyone help me out? Also, there is a complication: I have 4 leap years in my data frame. Any recommendations on how to deal with them? Thankfully

CodePudding user response:

library(dplyr)

tcmean<-md %>% group_by(dia) %>% summarise(m_tc=mean(tc_h))

CodePudding user response:

First simulate a data set with the relevant columns and number of rows, then aggregate by day giving m_tc.

As for the question, create an auxiliary variable mdia by formating the dates column as month-day only. Compute the means grouping by mdia. The result is a data.frame with 366 rows and 2 columns as expected.

set.seed(2022)

# number of rows in the question
n <- 131496L
dia <- seq(as.Date("2007-01-01"), as.Date("2021-12-31"), by = "1 day")
md <- data.frame(
  dia = sort(sample(dia, n, TRUE)),
  tc_h = round(runif(n, 0, 40), 1)
)

m_tc <- aggregate(tc_h ~ dia, md, mean)
mdia <- format(m_tc$dia, "%m-%d")
final <- aggregate(tc_h ~ mdia, m_tc, mean)

str(final)
#> 'data.frame':    366 obs. of  2 variables:
#>  $ mdia: chr  "01-01" "01-02" "01-03" "01-04" ...
#>  $ tc_h: num  20.2 20.4 20.2 19.6 20.7 ...

head(final, n = 10L)
#>     mdia     tc_h
#> 1  01-01 20.20741
#> 2  01-02 20.44143
#> 3  01-03 20.20979
#> 4  01-04 19.63611
#> 5  01-05 20.69064
#> 6  01-06 18.89658
#> 7  01-07 20.15992
#> 8  01-08 19.53639
#> 9  01-09 19.52999
#> 10 01-10 19.71914

Created on 2022-10-18 with reprex v2.0.2

  • Related