I have a dataset containing data for 6k asset and their marketprices.
I want to compute the daily returns, hence to apply the formula
martketprice[i] - marketprice[i-1]/marketprice[i-1]
The problem is that I have multiple observation for the same datetime, for example for asset x
, I have 3 observation for the day t
because it was traded by investor 1, 2 and 3
. And so on so forth for every asset in the dataset.
So my dataset can look like:
investor asset datetime marketprice
1 x t 10
2 x t 10
3 x t 10
My idea was to use something like
res <- res %>%
arrange(datetime) %>%
group_by(asset) %>%
mutate(ret = (marketprice - dplyr::lag(marketprice))/dplyr::lag(marketprice, default = NA)) %>%
ungroup()
but it doesn't work since, in the example above, for row 2 would mean use marketprice [i-1]
which is the same day marketprice, while I want the previous day [t-1]
to be used (not included in the example dataset)
Furthermore R should check that the [i-1]
marketprice is not belonging to a date which is more than 4 days distant, hence if the date of row i is 10th of july
, then the computation should apply only if the date [i-1]
is 6th of july
or closer.
Any idea?
CodePudding user response:
Based on the following assumptions I understand:
- When repeated day for the same asset, the marketprice is the same, not matter the investor.
- You don't mind which investor was (so we can remove rows)
- When day (t) is 5 days or ahead from previous (t-1), a NaN output is ok.
Libraries and some data example:
library(lubridate)
library(tidyverse)
# Data example
set.seed(132) # reproducibility
example = data.frame(
investor = c(rep(1,3),2,3,rep(2,2),1,
rep(2,4),rep(3,4)),
asset = c(rep('A',8),
rep('B',8)),
datetime = c(today() c(1,2,3,3,3,4,5,6),
today() c(1,seq(6,9),seq(16,18))),
marketprice = c(10,20,30,30,30,sample(c(10,20,30),11,replace = TRUE))
)
Example dataset has 2 assets. First one (A) shows how the code deals with several rows for the same day. Second (B) shows how the code deals when there is a jump in dates greater than 4 days.
> example
investor asset datetime marketprice
1 1 A 2022-05-26 10
2 1 A 2022-05-27 20
3 1 A 2022-05-28 30
4 2 A 2022-05-28 30
5 3 A 2022-05-28 30
6 2 A 2022-05-29 30
7 2 A 2022-05-30 30
8 1 A 2022-05-31 30
9 2 B 2022-05-26 20
10 2 B 2022-05-31 10
11 2 B 2022-06-01 20
12 2 B 2022-06-02 10
13 3 B 2022-06-03 10
14 3 B 2022-06-10 30
15 3 B 2022-06-11 20
16 3 B 2022-06-12 10
Dplyr code:
# The formula is [price(t)-price(t-1)]/price(t-1) -> dif(price)/lag(price)
ret = example %>%
group_by(asset,datetime) %>%
slice(1) %>% # remove repeated dates
group_by(asset) %>%
arrange(datetime) %>%
mutate(ret = ifelse(datetime-lag(datetime) > 4,
NA,
(marketprice-lag(marketprice))/lag(marketprice))
) %>% # ifelse check the differences of days
arrange(asset,datetime) # show by assets and dates
Output:
# A tibble: 14 x 5
# Groups: asset [2]
investor asset datetime marketprice ret
<dbl> <chr> <date> <dbl> <dbl>
1 1 A 2022-05-26 10 NA
2 1 A 2022-05-27 20 1
3 1 A 2022-05-28 30 0.5
4 2 A 2022-05-29 30 0
5 2 A 2022-05-30 30 0
6 1 A 2022-05-31 30 0
7 2 B 2022-05-26 20 NA
8 2 B 2022-05-31 10 NA
9 2 B 2022-06-01 20 1
10 2 B 2022-06-02 10 -0.5
11 3 B 2022-06-03 10 0
12 3 B 2022-06-10 30 NA
13 3 B 2022-06-11 20 -0.333
14 3 B 2022-06-12 10 -0.5
2 rows dropped because a day had 3 entries of data.