I am trying to build a data frame that shows a culmination of all transactions for each day (not usi-CodePudding

For each day, I want to see a culmination of all transactions that have happened up to that day (using As_Of_Dt as the indicator of the current day). E.g. on the first day (1/1/22), there was only one transaction so would expect to see one row for that As_Of_Dt. Second day (1/2/2022), there had been two transactions, so would expect to see three transactions for that As_Of_Dt (the two transactions from 1/2/22, plus the one transaction from 1/1/22), etc for all days going forward.

Below is an example of a dataset where I was able to do this using a for loop, however my real dataset is much larger and I am looking for a faster solution (maybe using the apply family?)

df <- data.frame(TradeDate = c("2022-01-01","2022-01-02","2022-01-02", "2022-01-03","2022-01-04"),
                 Ticker = c("AAPL", "AMZN", "META", "AAPL","TSLA"),
                 Type = c("Buy", "Buy", "Buy", "Buy","Buy"),
                 Shares = c(10, 12, 5, 15, 20))

df1 <- df[0, ]
dtStart <- as_date("2022-01-01")
dtEnd <- as_date("2022-01-04")
dtRng <- interval(dtStart, dtEnd) %/% days(1)   1


for(i in 1:dtRng){
  df2 <- df %>%
    filter(TradeDate <= (dtStart   i - 1)) %>%
    mutate(As_Of_Dt = as_date(dtStart)   i - 1)
  df1 <- bind_rows(df1, df2)
}

TradeDate	Ticker	Type	Shares	As_Of_Dt
2022-01-01	AAPL	Buy	10	2022-01-01
2022-01-01	AAPL	Buy	10	2022-01-02
2022-01-02	AMZN	Buy	12	2022-01-02
2022-01-02	META	Buy	5	2022-01-02
2022-01-01	AAPL	Buy	10	2022-01-03
2022-01-02	AMZN	Buy	12	2022-01-03
2022-01-02	META	Buy	5	2022-01-03
2022-01-03	AAPL	Buy	15	2022-01-03
2022-01-01	AAPL	Buy	10	2022-01-04
2022-01-02	AMZN	Buy	12	2022-01-04
2022-01-02	META	Buy	5	2022-01-04
2022-01-03	AAPL	Buy	15	2022-01-04
2022-01-04	TSLA	Buy	20	2022-01-04

I tried doing this by replacing the for loop with a function and inputting the function into lapply, however I am not very familiar with the apply family and had issues iterating the filter for each day.

CodePudding user response：

I expect this is more efficient that your loop, but not as efficient as a non-equi join.

library(dplyr)
df %>%
  distinct(as_of = TradeDate) %>%
  left_join(df, by = character()) %>%
  filter(TradeDate <= as_of)

Or using the dev version of dplyr which introduces non-equi joins

devtools::install_github("tidyverse/dplyr")
df %>% 
  distinct(as_of = TradeDate) %>%
  left_join(df, join_by(as_of >= TradeDate))

Result

        as_of  TradeDate Ticker Type Shares
1  2022-01-01 2022-01-01   AAPL  Buy     10
2  2022-01-02 2022-01-01   AAPL  Buy     10
3  2022-01-02 2022-01-02   AMZN  Buy     12
4  2022-01-02 2022-01-02   META  Buy      5
5  2022-01-03 2022-01-01   AAPL  Buy     10
6  2022-01-03 2022-01-02   AMZN  Buy     12
7  2022-01-03 2022-01-02   META  Buy      5
8  2022-01-03 2022-01-03   AAPL  Buy     15
9  2022-01-04 2022-01-01   AAPL  Buy     10
10 2022-01-04 2022-01-02   AMZN  Buy     12
11 2022-01-04 2022-01-02   META  Buy      5
12 2022-01-04 2022-01-03   AAPL  Buy     15
13 2022-01-04 2022-01-04   TSLA  Buy     20