Filter in dplyr interval of dates-CodePudding

I have the following simulated dataset in R:

library(tidyverse)
A = seq(from = as.Date("2021/1/1"),to=as.Date("2022/1/1"), length.out = 252)
length(A)
x = rnorm(252)
d = tibble(A,x);d

that looks like :

# A tibble: 252 × 2
   A               x
   <date>      <dbl>
 1 2021-01-01  0.445
 2 2021-01-02 -0.793
 3 2021-01-03 -0.367
 4 2021-01-05  1.64 
 5 2021-01-06 -1.15 
 6 2021-01-08  0.276
 7 2021-01-09  1.09 
 8 2021-01-11  0.443
 9 2021-01-12 -0.378
10 2021-01-14  0.203
# … with 242 more rows

Is one year of 252 trading days.Let's say I have a date of my interest which is:

start = as.Date("2021-05-23");start.

I want to filter the data set and the result to be a new dataset starting from this starting date and the next 20 index dates NOT simple days, and then to find the total indexes that the new dataset contains.

For example from the starting date and after I have :


d1=d%>%
  dplyr::filter(A>start)%>%
  dplyr::summarise(n())
d1
# A tibble: 1 × 1
  `n()`
  <int>
1    98

but I want from the starting date and after the next 20 trading days.How can I do that ? Any help?

CodePudding user response：

Perhaps a brute-force attempt:

d %>%
  filter(between(A, start, max(head(sort(A[A > start]), 20))))
# # A tibble: 20 x 2
#    A                x
#    <date>       <dbl>
#  1 2021-05-23 -0.185 
#  2 2021-05-24  0.102 
#  3 2021-05-26  0.429 
#  4 2021-05-27 -1.21  
#  5 2021-05-29  0.260 
#  6 2021-05-30  0.479 
#  7 2021-06-01 -0.623 
#  8 2021-06-02  0.982 
#  9 2021-06-04 -0.0533
# 10 2021-06-05  1.08  
# 11 2021-06-07 -1.96  
# 12 2021-06-08 -0.613 
# 13 2021-06-09 -0.267 
# 14 2021-06-11 -0.284 
# 15 2021-06-12  0.0851
# 16 2021-06-14  0.355 
# 17 2021-06-15 -0.635 
# 18 2021-06-17 -0.606 
# 19 2021-06-18 -0.485 
# 20 2021-06-20  0.255

If you have duplicate dates, you may prefer to use head(sort(unique(A[A > start])),20), depending on what "20 index dates" means.

And to find the number of indices, you can summarise or count as needed.

CodePudding user response：

You could first sort by the date, filter for days greater than given date and then pull top 20 records.

 d1 = d  %>% 
    arrange(A)  %>% 
    filter(A > start)  %>% 
    head(20)