I am a beginner at R (less than 2 years experience) and have been searching in the forums for an answer to my question.
I have a financial stock dataset similar to the following:
sign | reference | price | total_volume | type | PM |
---|---|---|---|---|---|
1 | 80407_34260 | 16.78 | 100 | bid | 16.81 |
2 | 80407_34260 | 16.77 | 500 | bid | 16.81 |
3 | 80407_34260 | 16.75 | 1064 | bid | 16.81 |
-1 | 80407_34260 | 16.84 | 200 | ask | 16.81 |
-2 | 80407_34260 | 16.85 | 255 | ask | 16.81 |
-3 | 80407_34260 | 16.89 | 2200 | ask | 16.81 |
1 | 80407_34320 | 16.76 | 185 | bid | 16.795 |
2 | 80407_34320 | 16.75 | 1064 | bid | 16.795 |
3 | 80407_34320 | 16.74 | 100 | bid | 16.795 |
-1 | 80407_34320 | 16.83 | 100 | ask | 16.795 |
-2 | 80407_34320 | 16.84 | 1800 | ask | 16.795 |
-3 | 80407_34320 | 16.85 | 100 | ask | 16.795 |
Where PM is the midpoint.
I need to calculate the weighted relative spread for each reference for a roundtrip of 800 and 100.
The weighted spread is calculated as the difference between the wighted ask spread and the weighted bid spread onto the roundtrip.
A manual example for reference 80407_34260 in the 100 roundtrip would be:
-> S_A(100) = (100*16.85)/100 = 16.85
-> S_B(100) = (100*16.78)/100 = 16.78
And for the 800 roundtrip:
-> S_A(800) = (20016.84 25516.85 (800-255-200)*16.89)/800
-> S_B(800) = (10016.78 50016.77 (1064-500-100)*16.75)/800
So to weigh you have to multiply the total_volume * quote as long as it covers the roundtrip volume.
The relative spread would then be S_A(800) - S_B(800)/PM
So my trouble is that I have to create a function that contains a loop that goes through each reference and calculates both S_A and S_B for the concrete roundtrip. I want 2 columns, one for S_A and one for S_B per reference. I am not sure on how to create this and additionally, the thing is that references have different quantities of rows.
I also have the premise that if the sum of total_volumes for a reference is < roundtrip the full reference (and all its associated rows) should be deleted.
Maybe I need to create a counter for this?
I had thought of something like this:
vol_function = function(Stock, roundtrip, start, end){ totalcounter = 0
price = 0
for (i in start:end){
if(totalcounter < roundtrip){
if(totalcounter Stock$total_volume[i] > roundtrip){
currentcounter = roundtrip - totalcounter}
else{currentcounter = Stock$total_volume[i]}
price = price (currentcounter * Stock$quote[i])
totalcounter = totalcounter currentcounter}
}
return(price)}
And then:
get_results = function(stock){
start = 1
end = 10
data <- data.frame(matrix(ncol = 11, nrow = 0))
names <- c("Day","Hour", "Bid100", "Ask100", "Spread100",
"Bid800", "Ask800", "Spread800")
colnames(data) <- names
while(end < nrow(stock)){
bid100 = vol_function(stock, 100, start, end)
bid800 = vol_function(stock, 800, start, end)
start = start 10
end = end 10
ask100 = vol_function(stock, 100, start, end)
ask800 = vol_function(stock, 800, start, end)
data = data %>% add_row(Day= stock$date[start],
Hour = stock$time[start],
Bid100 = bid100,
Ask100 = ask100,
spread100 = ask100 - bid100,
Bid800 = bid800,
Ask800 = ask800,
spread800 = ask800 - bid800)
start = start 10
end = end 10}
return(data)}
CodePudding user response:
I think this will do what you're looking for.
It's a function that takes the data frame, groups it by reference and type (so subsequent calculations will be within those groups), calculates how much volume there was in prior observations, figures out how much of the current row should be included to get up to but no higher than the roundtrip total, and then multiplies that by the current price.
The last part is to summarize the total amount paid divided by roundtrip. moar_digits
shows more digits, since the default printing will make all the prices in the example look like 16.8 or 16.9.
If you want, this could be further reshaped to put the ask and bid in a single row.
library(dplyr)
calc_roundtrip <- function(roundtrip) {
df %>%
group_by(reference, type) %>%
mutate(prior_cuml_vol = cumsum(lag(total_volume, default = 0)),
cur_incl = pmax(0,pmin(total_volume, (roundtrip - prior_cuml_vol))),
paid = cur_incl * price) %>%
summarize(total = sum(paid) / roundtrip) %>%
mutate(moar_digits = formatC(total, digits = 6)) %>%
mutate(spread = total - lag(total))
}
calc_roundtrip(800)
#`summarise()` has grouped output by 'reference'. You can override using the `.groups`
#argument.
## A tibble: 4 × 5
## Groups: reference [2]
# reference type total moar_digits spread
# <chr> <chr> <dbl> <chr> <dbl>
#1 80407_34260 ask 16.9 16.8648 NA
#2 80407_34260 bid 16.8 16.7662 -0.0985
#3 80407_34320 ask 16.8 16.8388 NA
#4 80407_34320 bid 16.8 16.7523 -0.0864