Home > Mobile >  Calculating Buy and Hold return around event date per ID in R
Calculating Buy and Hold return around event date per ID in R

Time:09-12

I wrote a similar question yesterday, but then the problem became more complicated as I proceeded, so I would like to ask again.

I have a question about calculating returns in the following case. For each ID, if Date=EventDate, I hope to calculate "buy and hold return" from 5 days prior to the event date to 5 days after.

To be more specific with the table below, I want to calculate returns around the event date (-5 ~ 5 days) for each ID, where the returns are, for example, (9/10-1), (12/10-1), (14/10-1), ~ , (14/10-1), (17/10-1), (16/10-1) for ID = 1 (That is, the price six days prior to the event date is the denominator in the return calculation.)

 ---- ------------ ------- ------------ 
| ID |    Date    | Price | EventDate  |
 ---- ------------ ------- ------------ 
|  1 | 2011-03-06 |    10 | NA         |
|  1 | 2011-03-07 |     9 | NA         |
|  1 | 2011-03-08 |    12 | NA         |
|  1 | 2011-03-09 |    14 | NA         |
|  1 | 2011-03-10 |    15 | NA         |
|  1 | 2011-03-11 |    17 | NA         |
|  1 | 2011-03-12 |    12 | 2011-03-12 |
|  1 | 2011-03-13 |    14 | NA         |
|  1 | 2011-03-14 |    17 | NA         |
|  1 | 2011-03-15 |    14 | NA         |
|  1 | 2011-03-16 |    17 | NA         |
|  1 | 2011-03-17 |    16 | NA         |
|  1 | 2011-03-18 |    15 | NA         |
|  1 | 2011-03-19 |    16 | NA         |
|  1 | 2011-03-20 |    17 | NA         |
|  1 | 2011-03-21 |    18 | NA         |
|  1 | 2011-03-22 |    11 | NA         |
|  1 | 2011-03-23 |    15 | NA         |
|  1 | 2011-03-24 |    12 | 2011-03-24 |
|  1 | 2011-03-25 |    13 | NA         |
|  1 | 2011-03-26 |    15 | NA         |
|  2 | 2011-06-11 |    48 | NA         |
|  2 | 2011-06-12 |    49 | NA         |
|  2 | 2011-06-13 |    50 | NA         |
|  2 | 2011-06-14 |    57 | NA         |
|  2 | 2011-06-15 |    60 | NA         |
|  2 | 2011-06-16 |    49 | NA         |
|  2 | 2011-06-17 |    64 | NA         |
|  2 | 2011-06-18 |    63 | NA         |
|  2 | 2011-06-19 |    67 | 2011-06-19 |
|  2 | 2011-06-20 |    70 | NA         |
|  2 | 2011-06-21 |    58 | NA         |
|  2 | 2011-06-22 |    65 | NA         |
|  2 | 2011-06-23 |    57 | NA         |
|  2 | 2011-06-24 |    55 | NA         |
|  2 | 2011-06-25 |    57 | NA         |
|  2 | 2011-06-26 |    60 | NA         |
 ---- ------------ ------- ------------ 

Eventually, I hope to make the following table with a new column.

 ---- ------------ ------- ------------ --------------- 
| ID |    Date    | Price | EventDate  | BuyHoldReturn |
 ---- ------------ ------- ------------ --------------- 
|  1 | 2011-03-06 |    10 | NA         | NA            |
|  1 | 2011-03-07 |     9 | NA         | -0.1          |
|  1 | 2011-03-08 |    12 | NA         | 0.2           |
|  1 | 2011-03-09 |    14 | NA         | 0.4           |
|  1 | 2011-03-10 |    15 | NA         | 0.5           |
|  1 | 2011-03-11 |    17 | NA         | 0.7           |
|  1 | 2011-03-12 |    12 | 2011-03-12 | 0.2           |
|  1 | 2011-03-13 |    14 | NA         | 0.4           |
|  1 | 2011-03-14 |    17 | NA         | 0.7           |
|  1 | 2011-03-15 |    14 | NA         | 0.4           |
|  1 | 2011-03-16 |    17 | NA         | 0.7           |
|  1 | 2011-03-17 |    16 | NA         | 0.6           |
|  1 | 2011-03-18 |    15 | NA         | NA            |
|  1 | 2011-03-19 |    16 | NA         | 0.066666667   |
|  1 | 2011-03-20 |    17 | NA         | 0.133333333   |
|  1 | 2011-03-21 |    18 | NA         | 0.2           |
|  1 | 2011-03-22 |    11 | NA         | -0.266666667  |
|  1 | 2011-03-23 |    15 | NA         | 0             |
|  1 | 2011-03-24 |    12 | 2011-03-24 | -0.2          |
|  1 | 2011-03-25 |    13 | NA         | -0.133333333  |
|  1 | 2011-03-26 |    15 | NA         | 0             |
|  2 | 2011-06-11 |    48 | NA         | NA            |
|  2 | 2011-06-12 |    49 | NA         | NA            |
|  2 | 2011-06-13 |    50 | NA         | NA            |
|  2 | 2011-06-14 |    57 | NA         | 0.14          |
|  2 | 2011-06-15 |    60 | NA         | 0.2           |
|  2 | 2011-06-16 |    49 | NA         | -0.02         |
|  2 | 2011-06-17 |    64 | NA         | 0.28          |
|  2 | 2011-06-18 |    63 | NA         | 0.26          |
|  2 | 2011-06-19 |    67 | 2011-06-19 | 0.34          |
|  2 | 2011-06-20 |    70 | NA         | 0.4           |
|  2 | 2011-06-21 |    58 | NA         | 0.16          |
|  2 | 2011-06-22 |    65 | NA         | 0.3           |
|  2 | 2011-06-23 |    57 | NA         | 0.14          |
|  2 | 2011-06-24 |    55 | NA         | 0.1           |
|  2 | 2011-06-25 |    57 | NA         | NA            |
|  2 | 2011-06-26 |    60 | NA         | NA            |
 ---- ------------ ------- ------------ --------------- 

I have an idea of using the code below but couldn't figure out how to calculate the 11 buy and hold returns around the event date.

data<-data%>%
group_by(ID)%>%
mutate(BuyHoldReturn=ifelse(Date==EventDate, ....

Thanks in advance!

CodePudding user response:

You can do use dplyr as follows:

# ensure Date and EventDate are Date columns
df <- df %>% mutate(across(c(Date,EventDate), ~as.Date(.x)))


left_join(
  select(df,ID, Date), 
  inner_join(df %>% select(-EventDate),filter(df,!is.na(EventDate)) %>% distinct(ID, EventDate), by="ID") %>%
    filter(abs(EventDate-Date)<=6) %>% 
    group_by(ID, EventDate) %>% 
    mutate(BuyHoldReturn = c(NA,Price[-1]/Price[1]-1)),
  by=c("ID", "Date")
)

Output:

   ID       Date Price  EventDate BuyHoldReturn
1   1 2011-03-06    10 2011-03-12            NA
2   1 2011-03-07     9 2011-03-12   -0.10000000
3   1 2011-03-08    12 2011-03-12    0.20000000
4   1 2011-03-09    14 2011-03-12    0.40000000
5   1 2011-03-10    15 2011-03-12    0.50000000
6   1 2011-03-11    17 2011-03-12    0.70000000
7   1 2011-03-12    12 2011-03-12    0.20000000
8   1 2011-03-13    14 2011-03-12    0.40000000
9   1 2011-03-14    17 2011-03-12    0.70000000
10  1 2011-03-15    14 2011-03-12    0.40000000
11  1 2011-03-16    17 2011-03-12    0.70000000
12  1 2011-03-17    16 2011-03-12    0.60000000
13  1 2011-03-18    15 2011-03-12    0.50000000
14  1 2011-03-18    15 2011-03-24            NA
15  1 2011-03-19    16 2011-03-24    0.06666667
16  1 2011-03-20    17 2011-03-24    0.13333333
17  1 2011-03-21    18 2011-03-24    0.20000000
18  1 2011-03-22    11 2011-03-24   -0.26666667
19  1 2011-03-23    15 2011-03-24    0.00000000
20  1 2011-03-24    12 2011-03-24   -0.20000000
21  1 2011-03-25    13 2011-03-24   -0.13333333
22  1 2011-03-26    15 2011-03-24    0.00000000
23  2 2011-06-11    NA       <NA>            NA
24  2 2011-06-12    NA       <NA>            NA
25  2 2011-06-13    50 2011-06-19            NA
26  2 2011-06-14    57 2011-06-19    0.14000000
27  2 2011-06-15    60 2011-06-19    0.20000000
28  2 2011-06-16    49 2011-06-19   -0.02000000
29  2 2011-06-17    64 2011-06-19    0.28000000
30  2 2011-06-18    63 2011-06-19    0.26000000
31  2 2011-06-19    67 2011-06-19    0.34000000
32  2 2011-06-20    70 2011-06-19    0.40000000
33  2 2011-06-21    58 2011-06-19    0.16000000
34  2 2011-06-22    65 2011-06-19    0.30000000
35  2 2011-06-23    57 2011-06-19    0.14000000
36  2 2011-06-24    55 2011-06-19    0.10000000
37  2 2011-06-25    57 2011-06-19    0.14000000
38  2 2011-06-26    NA       <NA>            NA

This can be done similarly using data.table

library(data.table)
setDT(df)
df[,(c("Date", "EventDate")):=lapply(.SD, as.Date), .SDcols=c("Date", "EventDate")]
df[,!c("EventDate")][unique(df[!is.na(EventDate), .(ID, EventDate)]), on="ID", allow.cartesian=T][
  abs(EventDate-Date)<=6][,BuyHoldReturn:=c(NA,Price[-1]/Price[1]-1), .(ID,EventDate)][
    df[,.(ID,Date)], on=.(ID,Date)
]

Both of the above approaches use cartesian joins, and then filter. If you have a very large table and you are looking for some increase in speed, you can also use a data.table non-equi join to isolate the rows before and after the EventDate(s) for each ID

  1. load library and set data to data.table
library(data.table)
setDT(dt)
  1. Create an events frame, which just has the ID and Events
events = unique(df[!is.na(EventDate),.(ID,EventDate)])

      ID  EventDate
   <int>     <IDat>
1:     1 2011-03-12
2:     1 2011-03-24
3:     2 2011-06-19
  1. Add some helper columns to the frames (this facilitates retention of columns in non-equi joins
events[, eDate:=EventDate]
df[, `:=`(s=Date-6, e=Date 6)]
  1. Use a non-equi join: require ID to match and EventDate to be between the lower and upper bounds created above (i.e. s and e)
bhr = events[df, on=.(ID, EventDate>=s, EventDate<=e), nomatch=0]
  1. Generate the BuyHoldReturn column, by ID and EventDate
bhr = bhr[, .(Date, BuyHoldReturn=c(NA, Price[-1]/Price[1] -1)), by = .(ID,eDate)]
  1. Merge back on the original frame to retain the rows with NA, and select columns with desired name and arrangement.
bhr[df,on=.(ID,Date),.(ID,Date,Price,EventDate=i.EventDate,BuyHoldReturn)]

Input:

structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Date = c("2011-03-06", 
"2011-03-07", "2011-03-08", "2011-03-09", "2011-03-10", "2011-03-11", 
"2011-03-12", "2011-03-13", "2011-03-14", "2011-03-15", "2011-03-16", 
"2011-03-17", "2011-03-18", "2011-03-19", "2011-03-20", "2011-03-21", 
"2011-03-22", "2011-03-23", "2011-03-24", "2011-03-25", "2011-03-26", 
"2011-06-11", "2011-06-12", "2011-06-13", "2011-06-14", "2011-06-15", 
"2011-06-16", "2011-06-17", "2011-06-18", "2011-06-19", "2011-06-20", 
"2011-06-21", "2011-06-22", "2011-06-23", "2011-06-24", "2011-06-25", 
"2011-06-26"), Price = c(10L, 9L, 12L, 14L, 15L, 17L, 12L, 14L, 
17L, 14L, 17L, 16L, 15L, 16L, 17L, 18L, 11L, 15L, 12L, 13L, 15L, 
48L, 49L, 50L, 57L, 60L, 49L, 64L, 63L, 67L, 70L, 58L, 65L, 57L, 
55L, 57L, 60L), EventDate = c(NA, NA, NA, NA, NA, NA, "2011-03-12", 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-03-24", NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-06-19", NA, NA, NA, 
NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -37L
))
  • Related