I wrote a similar question yesterday, but then the problem became more complicated as I proceeded, so I would like to ask again.
I have a question about calculating returns in the following case. For each ID, if Date=EventDate, I hope to calculate "buy and hold return" from 5 days prior to the event date to 5 days after.
To be more specific with the table below, I want to calculate returns around the event date (-5 ~ 5 days) for each ID, where the returns are, for example, (9/10-1), (12/10-1), (14/10-1), ~ , (14/10-1), (17/10-1), (16/10-1) for ID = 1 (That is, the price six days prior to the event date is the denominator in the return calculation.)
---- ------------ ------- ------------
| ID | Date | Price | EventDate |
---- ------------ ------- ------------
| 1 | 2011-03-06 | 10 | NA |
| 1 | 2011-03-07 | 9 | NA |
| 1 | 2011-03-08 | 12 | NA |
| 1 | 2011-03-09 | 14 | NA |
| 1 | 2011-03-10 | 15 | NA |
| 1 | 2011-03-11 | 17 | NA |
| 1 | 2011-03-12 | 12 | 2011-03-12 |
| 1 | 2011-03-13 | 14 | NA |
| 1 | 2011-03-14 | 17 | NA |
| 1 | 2011-03-15 | 14 | NA |
| 1 | 2011-03-16 | 17 | NA |
| 1 | 2011-03-17 | 16 | NA |
| 1 | 2011-03-18 | 15 | NA |
| 1 | 2011-03-19 | 16 | NA |
| 1 | 2011-03-20 | 17 | NA |
| 1 | 2011-03-21 | 18 | NA |
| 1 | 2011-03-22 | 11 | NA |
| 1 | 2011-03-23 | 15 | NA |
| 1 | 2011-03-24 | 12 | 2011-03-24 |
| 1 | 2011-03-25 | 13 | NA |
| 1 | 2011-03-26 | 15 | NA |
| 2 | 2011-06-11 | 48 | NA |
| 2 | 2011-06-12 | 49 | NA |
| 2 | 2011-06-13 | 50 | NA |
| 2 | 2011-06-14 | 57 | NA |
| 2 | 2011-06-15 | 60 | NA |
| 2 | 2011-06-16 | 49 | NA |
| 2 | 2011-06-17 | 64 | NA |
| 2 | 2011-06-18 | 63 | NA |
| 2 | 2011-06-19 | 67 | 2011-06-19 |
| 2 | 2011-06-20 | 70 | NA |
| 2 | 2011-06-21 | 58 | NA |
| 2 | 2011-06-22 | 65 | NA |
| 2 | 2011-06-23 | 57 | NA |
| 2 | 2011-06-24 | 55 | NA |
| 2 | 2011-06-25 | 57 | NA |
| 2 | 2011-06-26 | 60 | NA |
---- ------------ ------- ------------
Eventually, I hope to make the following table with a new column.
---- ------------ ------- ------------ ---------------
| ID | Date | Price | EventDate | BuyHoldReturn |
---- ------------ ------- ------------ ---------------
| 1 | 2011-03-06 | 10 | NA | NA |
| 1 | 2011-03-07 | 9 | NA | -0.1 |
| 1 | 2011-03-08 | 12 | NA | 0.2 |
| 1 | 2011-03-09 | 14 | NA | 0.4 |
| 1 | 2011-03-10 | 15 | NA | 0.5 |
| 1 | 2011-03-11 | 17 | NA | 0.7 |
| 1 | 2011-03-12 | 12 | 2011-03-12 | 0.2 |
| 1 | 2011-03-13 | 14 | NA | 0.4 |
| 1 | 2011-03-14 | 17 | NA | 0.7 |
| 1 | 2011-03-15 | 14 | NA | 0.4 |
| 1 | 2011-03-16 | 17 | NA | 0.7 |
| 1 | 2011-03-17 | 16 | NA | 0.6 |
| 1 | 2011-03-18 | 15 | NA | NA |
| 1 | 2011-03-19 | 16 | NA | 0.066666667 |
| 1 | 2011-03-20 | 17 | NA | 0.133333333 |
| 1 | 2011-03-21 | 18 | NA | 0.2 |
| 1 | 2011-03-22 | 11 | NA | -0.266666667 |
| 1 | 2011-03-23 | 15 | NA | 0 |
| 1 | 2011-03-24 | 12 | 2011-03-24 | -0.2 |
| 1 | 2011-03-25 | 13 | NA | -0.133333333 |
| 1 | 2011-03-26 | 15 | NA | 0 |
| 2 | 2011-06-11 | 48 | NA | NA |
| 2 | 2011-06-12 | 49 | NA | NA |
| 2 | 2011-06-13 | 50 | NA | NA |
| 2 | 2011-06-14 | 57 | NA | 0.14 |
| 2 | 2011-06-15 | 60 | NA | 0.2 |
| 2 | 2011-06-16 | 49 | NA | -0.02 |
| 2 | 2011-06-17 | 64 | NA | 0.28 |
| 2 | 2011-06-18 | 63 | NA | 0.26 |
| 2 | 2011-06-19 | 67 | 2011-06-19 | 0.34 |
| 2 | 2011-06-20 | 70 | NA | 0.4 |
| 2 | 2011-06-21 | 58 | NA | 0.16 |
| 2 | 2011-06-22 | 65 | NA | 0.3 |
| 2 | 2011-06-23 | 57 | NA | 0.14 |
| 2 | 2011-06-24 | 55 | NA | 0.1 |
| 2 | 2011-06-25 | 57 | NA | NA |
| 2 | 2011-06-26 | 60 | NA | NA |
---- ------------ ------- ------------ ---------------
I have an idea of using the code below but couldn't figure out how to calculate the 11 buy and hold returns around the event date.
data<-data%>%
group_by(ID)%>%
mutate(BuyHoldReturn=ifelse(Date==EventDate, ....
Thanks in advance!
CodePudding user response:
You can do use dplyr
as follows:
# ensure Date and EventDate are Date columns
df <- df %>% mutate(across(c(Date,EventDate), ~as.Date(.x)))
left_join(
select(df,ID, Date),
inner_join(df %>% select(-EventDate),filter(df,!is.na(EventDate)) %>% distinct(ID, EventDate), by="ID") %>%
filter(abs(EventDate-Date)<=6) %>%
group_by(ID, EventDate) %>%
mutate(BuyHoldReturn = c(NA,Price[-1]/Price[1]-1)),
by=c("ID", "Date")
)
Output:
ID Date Price EventDate BuyHoldReturn
1 1 2011-03-06 10 2011-03-12 NA
2 1 2011-03-07 9 2011-03-12 -0.10000000
3 1 2011-03-08 12 2011-03-12 0.20000000
4 1 2011-03-09 14 2011-03-12 0.40000000
5 1 2011-03-10 15 2011-03-12 0.50000000
6 1 2011-03-11 17 2011-03-12 0.70000000
7 1 2011-03-12 12 2011-03-12 0.20000000
8 1 2011-03-13 14 2011-03-12 0.40000000
9 1 2011-03-14 17 2011-03-12 0.70000000
10 1 2011-03-15 14 2011-03-12 0.40000000
11 1 2011-03-16 17 2011-03-12 0.70000000
12 1 2011-03-17 16 2011-03-12 0.60000000
13 1 2011-03-18 15 2011-03-12 0.50000000
14 1 2011-03-18 15 2011-03-24 NA
15 1 2011-03-19 16 2011-03-24 0.06666667
16 1 2011-03-20 17 2011-03-24 0.13333333
17 1 2011-03-21 18 2011-03-24 0.20000000
18 1 2011-03-22 11 2011-03-24 -0.26666667
19 1 2011-03-23 15 2011-03-24 0.00000000
20 1 2011-03-24 12 2011-03-24 -0.20000000
21 1 2011-03-25 13 2011-03-24 -0.13333333
22 1 2011-03-26 15 2011-03-24 0.00000000
23 2 2011-06-11 NA <NA> NA
24 2 2011-06-12 NA <NA> NA
25 2 2011-06-13 50 2011-06-19 NA
26 2 2011-06-14 57 2011-06-19 0.14000000
27 2 2011-06-15 60 2011-06-19 0.20000000
28 2 2011-06-16 49 2011-06-19 -0.02000000
29 2 2011-06-17 64 2011-06-19 0.28000000
30 2 2011-06-18 63 2011-06-19 0.26000000
31 2 2011-06-19 67 2011-06-19 0.34000000
32 2 2011-06-20 70 2011-06-19 0.40000000
33 2 2011-06-21 58 2011-06-19 0.16000000
34 2 2011-06-22 65 2011-06-19 0.30000000
35 2 2011-06-23 57 2011-06-19 0.14000000
36 2 2011-06-24 55 2011-06-19 0.10000000
37 2 2011-06-25 57 2011-06-19 0.14000000
38 2 2011-06-26 NA <NA> NA
This can be done similarly using data.table
library(data.table)
setDT(df)
df[,(c("Date", "EventDate")):=lapply(.SD, as.Date), .SDcols=c("Date", "EventDate")]
df[,!c("EventDate")][unique(df[!is.na(EventDate), .(ID, EventDate)]), on="ID", allow.cartesian=T][
abs(EventDate-Date)<=6][,BuyHoldReturn:=c(NA,Price[-1]/Price[1]-1), .(ID,EventDate)][
df[,.(ID,Date)], on=.(ID,Date)
]
Both of the above approaches use cartesian joins, and then filter. If you have a very large table and you are looking for some increase in speed, you can also use a data.table
non-equi join to isolate the rows before and after the EventDate
(s) for each ID
- load library and set data to data.table
library(data.table)
setDT(dt)
- Create an events frame, which just has the ID and Events
events = unique(df[!is.na(EventDate),.(ID,EventDate)])
ID EventDate
<int> <IDat>
1: 1 2011-03-12
2: 1 2011-03-24
3: 2 2011-06-19
- Add some helper columns to the frames (this facilitates retention of columns in non-equi joins
events[, eDate:=EventDate]
df[, `:=`(s=Date-6, e=Date 6)]
- Use a non-equi join: require ID to match and EventDate to be between the lower and upper bounds created above (i.e.
s
ande
)
bhr = events[df, on=.(ID, EventDate>=s, EventDate<=e), nomatch=0]
- Generate the
BuyHoldReturn
column, byID
andEventDate
bhr = bhr[, .(Date, BuyHoldReturn=c(NA, Price[-1]/Price[1] -1)), by = .(ID,eDate)]
- Merge back on the original frame to retain the rows with
NA
, and select columns with desired name and arrangement.
bhr[df,on=.(ID,Date),.(ID,Date,Price,EventDate=i.EventDate,BuyHoldReturn)]
Input:
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Date = c("2011-03-06",
"2011-03-07", "2011-03-08", "2011-03-09", "2011-03-10", "2011-03-11",
"2011-03-12", "2011-03-13", "2011-03-14", "2011-03-15", "2011-03-16",
"2011-03-17", "2011-03-18", "2011-03-19", "2011-03-20", "2011-03-21",
"2011-03-22", "2011-03-23", "2011-03-24", "2011-03-25", "2011-03-26",
"2011-06-11", "2011-06-12", "2011-06-13", "2011-06-14", "2011-06-15",
"2011-06-16", "2011-06-17", "2011-06-18", "2011-06-19", "2011-06-20",
"2011-06-21", "2011-06-22", "2011-06-23", "2011-06-24", "2011-06-25",
"2011-06-26"), Price = c(10L, 9L, 12L, 14L, 15L, 17L, 12L, 14L,
17L, 14L, 17L, 16L, 15L, 16L, 17L, 18L, 11L, 15L, 12L, 13L, 15L,
48L, 49L, 50L, 57L, 60L, 49L, 64L, 63L, 67L, 70L, 58L, 65L, 57L,
55L, 57L, 60L), EventDate = c(NA, NA, NA, NA, NA, NA, "2011-03-12",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-03-24", NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "2011-06-19", NA, NA, NA,
NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -37L
))