Obtaining trading days from rollapply in R-CodePudding

I have following simulated dataset of y column with fixed trading days (say 250) of 2018.

data
# A tibble: 249 × 2
   Date                     y
   <dttm>               <dbl>
 1 2018-01-02 00:00:00  0.409
 2 2018-01-03 00:00:00 -1.90 
 3 2018-01-04 00:00:00  0.131
 4 2018-01-05 00:00:00 -0.619
 5 2018-01-08 00:00:00  0.449
 6 2018-01-09 00:00:00  0.448
 7 2018-01-10 00:00:00  0.124
 8 2018-01-11 00:00:00 -0.346
 9 2018-01-12 00:00:00  0.775
10 2018-01-15 00:00:00 -0.948
# … with 239 more rows

with tail

> tail(data,n=10)
# A tibble: 10 × 2
   Date                       y
   <dttm>                 <dbl>
 1 2018-12-13 00:00:00 -0.00736
 2 2018-12-14 00:00:00 -1.30   
 3 2018-12-17 00:00:00  0.227  
 4 2018-12-18 00:00:00 -0.671  
 5 2018-12-19 00:00:00 -0.750  
 6 2018-12-20 00:00:00 -0.906  
 7 2018-12-21 00:00:00 -1.74   
 8 2018-12-27 00:00:00  0.331  
 9 2018-12-28 00:00:00 -0.768  
10 2018-12-31 00:00:00  0.649

I want to calculate rolling sd of column y with window 60 and then to find the exact trading days not actual-usual days (it can be done from index? I don't know.)

data2 = data%>%
  mutate(date = as.Date(Date))
data3=data2[,-1];head(data3)
roll_win = 60
data3$a = c(rep(NA_real_, roll_win - 1), zoo::rollapply(data3$y, roll_win ,sd))
dat = subset(data3, !is.na(a))
dat_max = dat[dat$a == max(dat$a, na.rm = TRUE), ]
dat_max$date_start = dat_max$date -  (roll_win - 1)
dat_max

Turn outs that the period of high volatility is :

dat_max
# A tibble: 1 × 4
      y date           a date_start
  <dbl> <date>     <dbl> <date>    
1 0.931 2018-04-24  1.18 2018-02-24

Now if I subtract the two dates I will have :

> dat_max$date - dat_max$date_start
Time difference of 59 days

Which is actually true but these are NOT THE TRADING DAYS.

I have asked a similar question here but it didn't solved the problem.Actually the asked question then was how I can obtain the days of high volatility.

Any help how I can obtain this trading days ? Thanks in advance

EDIT

FOR FULL DATA

library(gsheet)
data= gsheet2tbl("https://docs.google.com/spreadsheets/d/1PdZDb3OgqSaO6znUWsAh7p_MVLHgNbQM/edit?usp=sharing&ouid=109626011108852110510&rtpof=true&sd=true")
data

CodePudding user response：

Start date for each time window

If the question is how to calculate the start date for each window then using the data in the Note at the end and a window of 3:

w <- 3
out <- mutate(data, 
  sd = zoo::rollapplyr(y, w, sd, fill = NA),
  start = dplyr::lag(Date, w - 1)
)
out

giving:

         Date        y        sd      start
1  2018-12-13 -0.00736        NA       <NA>
2  2018-12-14 -1.30000        NA       <NA>
3  2018-12-17  0.22700 0.8223515 2018-12-13
4  2018-12-18 -0.67100 0.7674388 2018-12-14
5  2018-12-19 -0.75000 0.5427053 2018-12-17
6  2018-12-20 -0.90600 0.1195840 2018-12-18
7  2018-12-21 -1.74000 0.5322894 2018-12-19
8  2018-12-27  0.33100 1.0420146 2018-12-20
9  2018-12-28 -0.76800 1.0361488 2018-12-21
10 2018-12-31  0.64900 0.7435068 2018-12-27

Largest sd's with their start and end dates

and the largest 4 sd's and their start and end dates are:

head(dplyr::arrange(out, -sd), 4)

giving:

        Date      y        sd      start
8 2018-12-27  0.331 1.0420146 2018-12-20
9 2018-12-28 -0.768 1.0361488 2018-12-21
3 2018-12-17  0.227 0.8223515 2018-12-13
4 2018-12-18 -0.671 0.7674388 2018-12-14

Rows between two dates

If the question is how many rows are between and include two dates that appear in data then:

 d1 <- as.Date("2018-12-14")
 d2 <- as.Date("2018-12-20")
 diff(match(c(d1, d2), data$Date))   1
 ## [1] 5

Note

Lines <- "   Date                       y
 1 2018-12-13T00:00:00 -0.00736
 2 2018-12-14T00:00:00 -1.30   
 3 2018-12-17T00:00:00  0.227  
 4 2018-12-18T00:00:00 -0.671  
 5 2018-12-19T00:00:00 -0.750  
 6 2018-12-20T00:00:00 -0.906  
 7 2018-12-21T00:00:00 -1.74   
 8 2018-12-27T00:00:00  0.331  
 9 2018-12-28T00:00:00 -0.768  
10 2018-12-31T00:00:00  0.649"
data <- read.table(text = Lines)
data$Date <- as.Date(data$Date)