For each day, I want to see a culmination of all transactions that have happened up to that day (using As_Of_Dt as the indicator of the current day). E.g. on the first day (1/1/22), there was only one transaction so would expect to see one row for that As_Of_Dt. Second day (1/2/2022), there had been two transactions, so would expect to see three transactions for that As_Of_Dt (the two transactions from 1/2/22, plus the one transaction from 1/1/22), etc for all days going forward.
Below is an example of a dataset where I was able to do this using a for loop, however my real dataset is much larger and I am looking for a faster solution (maybe using the apply
family?)
df <- data.frame(TradeDate = c("2022-01-01","2022-01-02","2022-01-02", "2022-01-03","2022-01-04"),
Ticker = c("AAPL", "AMZN", "META", "AAPL","TSLA"),
Type = c("Buy", "Buy", "Buy", "Buy","Buy"),
Shares = c(10, 12, 5, 15, 20))
df1 <- df[0, ]
dtStart <- as_date("2022-01-01")
dtEnd <- as_date("2022-01-04")
dtRng <- interval(dtStart, dtEnd) %/% days(1) 1
for(i in 1:dtRng){
df2 <- df %>%
filter(TradeDate <= (dtStart i - 1)) %>%
mutate(As_Of_Dt = as_date(dtStart) i - 1)
df1 <- bind_rows(df1, df2)
}
TradeDate | Ticker | Type | Shares | As_Of_Dt |
---|---|---|---|---|
2022-01-01 | AAPL | Buy | 10 | 2022-01-01 |
2022-01-01 | AAPL | Buy | 10 | 2022-01-02 |
2022-01-02 | AMZN | Buy | 12 | 2022-01-02 |
2022-01-02 | META | Buy | 5 | 2022-01-02 |
2022-01-01 | AAPL | Buy | 10 | 2022-01-03 |
2022-01-02 | AMZN | Buy | 12 | 2022-01-03 |
2022-01-02 | META | Buy | 5 | 2022-01-03 |
2022-01-03 | AAPL | Buy | 15 | 2022-01-03 |
2022-01-01 | AAPL | Buy | 10 | 2022-01-04 |
2022-01-02 | AMZN | Buy | 12 | 2022-01-04 |
2022-01-02 | META | Buy | 5 | 2022-01-04 |
2022-01-03 | AAPL | Buy | 15 | 2022-01-04 |
2022-01-04 | TSLA | Buy | 20 | 2022-01-04 |
I tried doing this by replacing the for loop with a function and inputting the function into lapply
, however I am not very familiar with the apply
family and had issues iterating the filter for each day.
CodePudding user response:
I expect this is more efficient that your loop, but not as efficient as a non-equi join.
library(dplyr)
df %>%
distinct(as_of = TradeDate) %>%
left_join(df, by = character()) %>%
filter(TradeDate <= as_of)
Or using the dev version of dplyr which introduces non-equi joins
devtools::install_github("tidyverse/dplyr")
df %>%
distinct(as_of = TradeDate) %>%
left_join(df, join_by(as_of >= TradeDate))
Result
as_of TradeDate Ticker Type Shares
1 2022-01-01 2022-01-01 AAPL Buy 10
2 2022-01-02 2022-01-01 AAPL Buy 10
3 2022-01-02 2022-01-02 AMZN Buy 12
4 2022-01-02 2022-01-02 META Buy 5
5 2022-01-03 2022-01-01 AAPL Buy 10
6 2022-01-03 2022-01-02 AMZN Buy 12
7 2022-01-03 2022-01-02 META Buy 5
8 2022-01-03 2022-01-03 AAPL Buy 15
9 2022-01-04 2022-01-01 AAPL Buy 10
10 2022-01-04 2022-01-02 AMZN Buy 12
11 2022-01-04 2022-01-02 META Buy 5
12 2022-01-04 2022-01-03 AAPL Buy 15
13 2022-01-04 2022-01-04 TSLA Buy 20