Home > other >  Annual moving window over a data frame
Annual moving window over a data frame

Time:10-26

I have a data frame of discharge data. Below is a reproducible example:

library(lubridate)
Date <- sample(seq(as.Date('1981/01/01'), as.Date('1982/12/31'), by="day"), 24)
Date <- sort(Date, decreasing = F)
Station <- rep(as.character("A"), 24)
Discharge <- rnorm(n = 24, mean = 1, 1)
df <- cbind.data.frame(Station, Date, Discharge)
df$Year <- year(df$Date)
df$Month <- month(df$Date)
df$Day <- day(df$Date)

The output:

> df
   Station       Date   Discharge Year Month Day
1        A 1981-01-23  0.75514968 1981     1  23
2        A 1981-02-17 -0.08552776 1981     2  17
3        A 1981-03-20  1.47586712 1981     3  20
4        A 1981-04-26  3.64823544 1981     4  26
5        A 1981-05-22  1.21880453 1981     5  22
6        A 1981-05-23  2.19482857 1981     5  23
7        A 1981-07-02 -0.13598754 1981     7   2
8        A 1981-07-23  0.12365626 1981     7  23
9        A 1981-07-24  2.12557882 1981     7  24
10       A 1981-09-02  2.79879494 1981     9   2
11       A 1981-09-04  1.67926948 1981     9   4
12       A 1981-11-06  0.49720784 1981    11   6
13       A 1981-12-21 -0.25272271 1981    12  21
14       A 1982-04-08  1.39706157 1982     4   8
15       A 1982-04-19 -0.13965981 1982     4  19
16       A 1982-05-26  0.55238425 1982     5  26
17       A 1982-06-23  3.94639154 1982     6  23
18       A 1982-06-25 -0.03415929 1982     6  25
19       A 1982-07-15  1.00996167 1982     7  15
20       A 1982-09-11  3.18225186 1982     9  11
21       A 1982-10-17  0.30875497 1982    10  17
22       A 1982-10-30  2.26209011 1982    10  30
23       A 1982-11-06  0.34430489 1982    11   6
24       A 1982-11-19  2.28251458 1982    11  19

What I need to do is to create a moving window function using base R. I have tried using runner package but it is proving not to be so flexible. This moving window (say 3) shall take 3 rows at a time and calculate the mean discharge. This window shall continue till the last date of the year 1981. Another window shall start from 1982 and do the same. How to approach this?

CodePudding user response:

Using base R only

w=3

df$DischargeM=sapply(1:nrow(df),function(x){
  tmp=NA
  if (x>=w) {
    if (length(unique(df$Year[(x-w 1):x]))==1) {
      tmp=mean(df$Discharge[(x-w 1):x])
    }
  }
  tmp
})

   Station       Date  Discharge Year Month Day DischargeM
1        A 1981-01-21  2.0009355 1981     1  21         NA
2        A 1981-02-11  0.5948567 1981     2  11         NA
3        A 1981-04-17  0.2637090 1981     4  17 0.95316705
4        A 1981-04-18  3.9180253 1981     4  18 1.59219699
5        A 1981-05-09 -0.2589129 1981     5   9 1.30760712
6        A 1981-07-05  1.1055913 1981     7   5 1.58823456
7        A 1981-07-11  0.7561600 1981     7  11 0.53427946
8        A 1981-07-22  0.0978999 1981     7  22 0.65321706
9        A 1981-08-04  0.5410163 1981     8   4 0.46502541
10       A 1981-08-13 -0.5044425 1981     8  13 0.04482458
11       A 1981-10-06  1.5954315 1981    10   6 0.54400178
12       A 1981-11-08 -0.5757041 1981    11   8 0.17176164
13       A 1981-12-24  1.3892440 1981    12  24 0.80299047
14       A 1982-01-07  1.9363874 1982     1   7         NA
15       A 1982-02-20  1.4340554 1982     2  20         NA
16       A 1982-05-29  0.4536461 1982     5  29 1.27469632
17       A 1982-06-10  2.9776761 1982     6  10 1.62179253
18       A 1982-06-17  1.6371733 1982     6  17 1.68949847
19       A 1982-06-28  1.7585579 1982     6  28 2.12446908
20       A 1982-08-17  0.8297518 1982     8  17 1.40849432
21       A 1982-09-21  1.6853808 1982     9  21 1.42456348
22       A 1982-11-13  0.6066167 1982    11  13 1.04058309
23       A 1982-11-16  1.4989263 1982    11  16 1.26364126
24       A 1982-11-28  0.2273658 1982    11  28 0.77763625

(make sure your df is ordered).

CodePudding user response:

You can do this by using dplyr and the rollmean or rollmeanr function from zoo.

You group the data by year, and apply the rollmeanr in a mutate function.

library(dplyr)

df %>% 
  group_by(Year) %>% 
  mutate(avg = zoo::rollmeanr(Discharge, k = 3, fill = NA))

# A tibble: 24 x 7
# Groups:   Year [2]
   Station Date       Discharge  Year Month   Day    avg
   <chr>   <date>         <dbl> <dbl> <dbl> <int>  <dbl>
 1 A       1981-01-04    1.00    1981     1     4 NA    
 2 A       1981-03-26    0.0468  1981     3    26 NA    
 3 A       1981-03-28    0.431   1981     3    28  0.494
 4 A       1981-05-04    1.30    1981     5     4  0.593
 5 A       1981-08-26    2.06    1981     8    26  1.26 
 6 A       1981-10-14    1.09    1981    10    14  1.48 
 7 A       1981-12-10    1.28    1981    12    10  1.48 
 8 A       1981-12-23    0.668   1981    12    23  1.01 
 9 A       1982-01-02   -0.333   1982     1     2 NA    
10 A       1982-04-13    0.800   1982     4    13 NA    
# ... with 14 more rows

CodePudding user response:

Kindly let me know if this is what you were anticipating

Base version:

result <- transform(df, 
      Discharge_mean = ave(Discharge,Year,
                           FUN= function(x) rollapply(x,width = 3, mean, align='right',fill=NA))
      )

dplyr version:

result <-df %>%
  group_by(Year)%>%
  mutate(Discharge_mean=rollapply(Discharge,3,mean,align='right',fill=NA))

Output:

> result
  Station       Date    Discharge Year Month Day Discharge_mean
1        A 1981-01-09  0.560448487 1981     1   9             NA
2        A 1981-01-17  0.006777809 1981     1  17             NA
3        A 1981-02-08  2.008959399 1981     2   8      0.8587286
4        A 1981-02-21  1.166452993 1981     2  21      1.0607301
5        A 1981-04-12  3.120080595 1981     4  12      2.0984977
6        A 1981-04-24  2.647325960 1981     4  24      2.3112865
7        A 1981-05-01  0.764980310 1981     5   1      2.1774623
8        A 1981-05-20  2.203700845 1981     5  20      1.8720024
9        A 1981-06-19  0.519390897 1981     6  19      1.1626907
10       A 1981-07-06  1.704146872 1981     7   6      1.4757462
# 14 more rows
  • Related