Home > Blockchain >  How can I calculate the returns of a time series data frame according to a condition in R?
How can I calculate the returns of a time series data frame according to a condition in R?

Time:10-24

I have a data frame that look like this:

library(tidyverse)
LAG =2 
date = seq(as.Date("2022/1/1"), as.Date("2022/1/10"), by = "day")
close = c(1.02,1.01,0.992,1.01,1.01,0.993,0.98,1.02,1.02,0.994)
value = c(2,3,2,2,0,3,4,3,0,9)*1000000
df = tibble(date,close,value)
df%>%
  dplyr::mutate(y = as.numeric(close/ (dplyr::lag(close, LAG))-1))



# A tibble: 10 × 4
   date       close   value       y
   <date>     <dbl>   <dbl>   <dbl>
 1 2022-01-01 1.02  2000000 NA     
 2 2022-01-02 1.01  3000000 NA     
 3 2022-01-03 0.992 2000000 -0.0275
 4 2022-01-04 1.01  2000000  0     
 5 2022-01-05 1.01        0  0.0181
 6 2022-01-06 0.993 3000000 -0.0168
 7 2022-01-07 0.98  4000000 -0.0297
 8 2022-01-08 1.02  3000000  0.0272
 9 2022-01-09 1.02        0  0.0408
10 2022-01-10 0.994 9000000 -0.0255

the y column contains the 2 day percentage change in the close values. Now I want to exclude the y values that have ended or started in the date that the value was 0. For example on 2022-01-03 the close value was 0.992 and on 2022-01-05 was 1.01 with y = 0.0181. But on 2022-01-05 the value was 0.So this y value (0.0181) I want to be excluded.

Similarly the closing value on 2022-01-07 was 0.98. Therefore the 2 day percentage change is y = -0.0297.This y value I want to be excluded as well.

If it is the middle I want to keep it.

In general if I change the LAG value from 2 days to 3 days of even 5 days I want to exclude the Lag day returns (percentage difference) that ended or started on the day that the value was 0.

I have tried to calculate first the returns and then to filter out the values of column Value that are exactly 0.But this does not solve my problem. Even the opposite.First filter out the values of column Value that are exactly 0 and then calculate the returns.

Ideally I want to look like this:

# A tibble: 10 × 4
   date       close   value       y
   <date>     <dbl>   <dbl>   <dbl>
 1 2022-01-01 1.02  2000000 NA     
 2 2022-01-02 1.01  3000000 NA     
 3 2022-01-03 0.992 2000000 -0.0275
 4 2022-01-04 1.01  2000000  0     
 5 2022-01-05 1.01        0  NA
 6 2022-01-06 0.993 3000000 -0.0168
 7 2022-01-07 0.98  4000000  NA
 8 2022-01-08 1.02  3000000  0.0272
 9 2022-01-09 1.02        0  NA
10 2022-01-10 0.994 9000000 -0.0255

i.e the y value -0.0297 to be NA.

I have tried case_when function from dplyr package.Case when the value is not 0 to give me y and NA other wise.

df%>%
  dplyr::mutate(y = as.numeric(close/ (dplyr::lag(close, LAG))-1))%>%
  mutate(ynew = case_when(value!=0 ~ y ,
                          TRUE ~ NA_real_))

# A tibble: 10 × 5
   date       close   value       y    ynew
   <date>     <dbl>   <dbl>   <dbl>   <dbl>
 1 2022-01-01 1.02  2000000 NA      NA     
 2 2022-01-02 1.01  3000000 NA      NA     
 3 2022-01-03 0.992 2000000 -0.0275 -0.0275
 4 2022-01-04 1.01  2000000  0       0     
 5 2022-01-05 1.01        0  0.0181 NA     
 6 2022-01-06 0.993 3000000 -0.0168 -0.0168
 7 2022-01-07 0.98  4000000 -0.0297 -0.0297
 8 2022-01-08 1.02  3000000  0.0272  0.0272
 9 2022-01-09 1.02        0  0.0408 NA     
10 2022-01-10 0.994 9000000 -0.0255 -0.0255

but how can I rename to NA the lead 2 days from the date that value was 0 ? Any help ?

CodePudding user response:

With dplyr:

library(dplyr)

df %>%
      mutate(y = as.numeric(close/ (lag(close, LAG))-1)) %>%
      mutate(y = if_else((lag(value, LAG)==0)|value==0,NA_real_,y))

# A tibble: 10 × 4
   date       close   value       y
   <date>     <dbl>   <dbl>   <dbl>
 1 2022-01-01 1.02  2000000 NA     
 2 2022-01-02 1.01  3000000 NA     
 3 2022-01-03 0.992 2000000 -0.0275
 4 2022-01-04 1.01  2000000  0     
 5 2022-01-05 1.01        0 NA     
 6 2022-01-06 0.993 3000000 -0.0168
 7 2022-01-07 0.98  4000000 NA     
 8 2022-01-08 1.02  3000000  0.0272
 9 2022-01-09 1.02        0 NA     
10 2022-01-10 0.994 9000000 -0.0255

With data.table:

library(data.table)

setDT(df)[,y:=close/shift(close,LAG)-1][value==0|shift(value,LAG)==0,y:=NA][]

          date close value           y
        <Date> <num> <num>       <num>
 1: 2022-01-01 1.020 2e 06          NA
 2: 2022-01-02 1.010 3e 06          NA
 3: 2022-01-03 0.992 2e 06 -0.02745098
 4: 2022-01-04 1.010 2e 06  0.00000000
 5: 2022-01-05 1.010 0e 00          NA
 6: 2022-01-06 0.993 3e 06 -0.01683168
 7: 2022-01-07 0.980 4e 06          NA
 8: 2022-01-08 1.020 3e 06  0.02719033
 9: 2022-01-09 1.020 0e 00          NA
10: 2022-01-10 0.994 9e 06 -0.02549020
  • Related