Home > Enterprise >  make the loop more efficient and faster in r
make the loop more efficient and faster in r

Time:07-16

I made the following code for data It works fine, but the problem is that it takes too long as my dataset is huge. Could someone make the code more efficient and faster? Thanks a lot in advance!

for (f in 1: nlevels(try$IDISIN)) {
  temp<-subset(try, IDISIN==levels(try$IDISIN)[f])
  temp<-as.data.table(temp)
  temp2<-temp %>%
    arrange(TradingDate) 
  temp2<-as.data.table(temp2)

  
  for (i in 1:nrow(temp2)) {
    temp2$CSum[i]<-ifelse(i=="1", temp2$Dailysum[1],(temp2$CSum[i-1]   temp2$Dailysum[i]))
    if (temp2$CSum[i]<0) {
      Selling<-bind_rows(Selling, temp2[i])
      temp2$CSum[i]<-0
    }
    temp2$CSum[i]<-ifelse(temp2$FinalInd[i]==1, 
                          temp2$CSum[i]/temp2$A.Factor[i], 
                          temp2$CSum[i])
    
    
  }
  Rebind<-bind_rows(Rebind, temp2) 
  rm(list = "temp", "temp2")
}

and here is simplified dataset

try<- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
                  ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
                  TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
                  Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
                  A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
                  Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"))
library(data.table)
try<-as.data.table(try)
try[,IDISIN:=paste(ISIN, ID,sep = "-")]
Selling<-try[is.na(ISIN)]
Rebind<-try[is.na(ISIN)]


  ISIN ID TradingDate Dailysum A.Factor Ind IDISIN
1:  abc  A  2022-07-01       -4        0   0  abc-A
2:  abc  A  2022-07-02        8        0   0  abc-A
3:  ghi  A  2022-07-03        1      0.1   1  ghi-A
4:  def  B  2022-07-01        2        0   0  def-B
5:  def  B  2022-07-02       -6        0   0  def-B
6:  def  B  2022-07-03        9        0   0  def-B
7:  ghi  C  2022-07-01        4        0   0  ghi-C
8:  ghi  C  2022-07-02        8      0.5   1  ghi-C
9:  ghi  C  2022-07-03        9        0   0  ghi-C


I need to do following things

  1. cumulative sum by ISIN and ID

  2. If cumulatvie sume is negative

    1. save it into seperate dataframe ("Selling" in the above code)

    2. set cumulative sum to 0

  3. If Ind=1, divide cumulative sum with Factor

So it should look like this

dat <- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
                  ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
                  TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
                  Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
                  A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
                  Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"),
                  CSum= c("0", "8", "10", "2", "0","9", "4", "24", "33"))

  ISIN ID       Date Quantity Factor Ind CumulativeSum
1  abc  A 2022-07-01       -4      0   0             0
2  abc  A 2022-07-02        8      0   0             8
3  ghi  A 2022-07-03        1    0.1   1            10
4  def  B 2022-07-01        2      0   0             2
5  def  B 2022-07-02       -6      0   0             0
6  def  B 2022-07-03        9      0   0             9
7  ghi  C 2022-07-01        4      0   0             4
8  ghi  C 2022-07-02        8    0.5   1            24
9  ghi  C 2022-07-03        9      0   0            33

CodePudding user response:

I can't run your first code block, so I don't know for certain what Selling should look like, but I think we can do it as part of the main processing and then you can filter it out later.

First, I think many of those columns should be numbers, so

try[, names(try) := lapply(.SD, type.convert, as.is = TRUE)]
str(try)
# Classes 'data.table' and 'data.frame':    9 obs. of  6 variables:
#  $ ISIN       : chr  "abc" "abc" "ghi" "def" ...
#  $ ID         : chr  "A" "A" "A" "B" ...
#  $ TradingDate: chr  "2022-07-01" "2022-07-02" "2022-07-03" "2022-07-01" ...
#  $ Dailysum   : int  -4 8 1 2 -6 9 4 8 9
#  $ A.Factor   : num  0 0 0.1 0 0 0 0 0.5 0
#  $ Ind        : int  0 0 1 0 0 0 0 1 0

Second, I don't think we need IDISIN, since I think you're using it as an easy grouping variable, in which case data.table's use of by= takes care of it for us.

Third, I'm assuming that you are controlling the order of data (TradingDate) externally, perhaps setkey(try, ISIN, ID, TradingDate) or similar. I make no checks (or promises if this is not true.) (Over to you if you want try[, TradingDate := as.Date(TradingDate)], it seems logical to do so but changes nothing here.)

From here,

fun <- function(prev, this) {
  z <- prev[1]   this[1]
  c(max(z, 0), max(-z, 0)) / (if (this[2] > 0) this[3] else 1)
}
try[, c("CumulativeSum", "Sell") :=
        transpose(Reduce(fun, transpose(list(Dailysum, Ind, A.Factor)),
                         init = c(0, 0), accumulate = TRUE)[-1]),
    by = .(ISIN, ID) ]
try
#      ISIN     ID TradingDate Dailysum A.Factor   Ind CumulativeSum  Sell
#    <char> <char>      <char>    <int>    <num> <int>         <num> <num>
# 1:    abc      A  2022-07-01       -4      0.0     0             0     4
# 2:    abc      A  2022-07-02        8      0.0     0             8     0
# 3:    ghi      A  2022-07-03        1      0.1     1            10     0
# 4:    def      B  2022-07-01        2      0.0     0             2     0
# 5:    def      B  2022-07-02       -6      0.0     0             0     4
# 6:    def      B  2022-07-03        9      0.0     0             9     0
# 7:    ghi      C  2022-07-01        4      0.0     0             4     0
# 8:    ghi      C  2022-07-02        8      0.5     1            24     0
# 9:    ghi      C  2022-07-03        9      0.0     0            33     0

No for loop required.

Admittedly, the Sell logic might need double-check to make sure that successive negatives will react as you need it. That should be handled in the function. In there, the two arguments:

  • prev are the previous row's values for CumulativeSum and Sell, as determined by the previous call to fun (within a group). The first time it is called for a group, it is preassigned the value c(0, 0) (by the init= argument).
  • this is a triplet of the current row's c(Dailysum, Ind, A.Factor) (all numbers, not named), so we index it directly by position.
  •  Tags:  
  • r
  • Related