I made the following code for data It works fine, but the problem is that it takes too long as my dataset is huge. Could someone make the code more efficient and faster? Thanks a lot in advance!
for (f in 1: nlevels(try$IDISIN)) {
temp<-subset(try, IDISIN==levels(try$IDISIN)[f])
temp<-as.data.table(temp)
temp2<-temp %>%
arrange(TradingDate)
temp2<-as.data.table(temp2)
for (i in 1:nrow(temp2)) {
temp2$CSum[i]<-ifelse(i=="1", temp2$Dailysum[1],(temp2$CSum[i-1] temp2$Dailysum[i]))
if (temp2$CSum[i]<0) {
Selling<-bind_rows(Selling, temp2[i])
temp2$CSum[i]<-0
}
temp2$CSum[i]<-ifelse(temp2$FinalInd[i]==1,
temp2$CSum[i]/temp2$A.Factor[i],
temp2$CSum[i])
}
Rebind<-bind_rows(Rebind, temp2)
rm(list = "temp", "temp2")
}
and here is simplified dataset
try<- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"))
library(data.table)
try<-as.data.table(try)
try[,IDISIN:=paste(ISIN, ID,sep = "-")]
Selling<-try[is.na(ISIN)]
Rebind<-try[is.na(ISIN)]
ISIN ID TradingDate Dailysum A.Factor Ind IDISIN
1: abc A 2022-07-01 -4 0 0 abc-A
2: abc A 2022-07-02 8 0 0 abc-A
3: ghi A 2022-07-03 1 0.1 1 ghi-A
4: def B 2022-07-01 2 0 0 def-B
5: def B 2022-07-02 -6 0 0 def-B
6: def B 2022-07-03 9 0 0 def-B
7: ghi C 2022-07-01 4 0 0 ghi-C
8: ghi C 2022-07-02 8 0.5 1 ghi-C
9: ghi C 2022-07-03 9 0 0 ghi-C
I need to do following things
cumulative sum by ISIN and ID
If cumulatvie sume is negative
save it into seperate dataframe ("Selling" in the above code)
set cumulative sum to 0
If Ind=1, divide cumulative sum with Factor
So it should look like this
dat <- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"),
CSum= c("0", "8", "10", "2", "0","9", "4", "24", "33"))
ISIN ID Date Quantity Factor Ind CumulativeSum
1 abc A 2022-07-01 -4 0 0 0
2 abc A 2022-07-02 8 0 0 8
3 ghi A 2022-07-03 1 0.1 1 10
4 def B 2022-07-01 2 0 0 2
5 def B 2022-07-02 -6 0 0 0
6 def B 2022-07-03 9 0 0 9
7 ghi C 2022-07-01 4 0 0 4
8 ghi C 2022-07-02 8 0.5 1 24
9 ghi C 2022-07-03 9 0 0 33
CodePudding user response:
I can't run your first code block, so I don't know for certain what Selling
should look like, but I think we can do it as part of the main processing and then you can filter it out later.
First, I think many of those columns should be numbers, so
try[, names(try) := lapply(.SD, type.convert, as.is = TRUE)]
str(try)
# Classes 'data.table' and 'data.frame': 9 obs. of 6 variables:
# $ ISIN : chr "abc" "abc" "ghi" "def" ...
# $ ID : chr "A" "A" "A" "B" ...
# $ TradingDate: chr "2022-07-01" "2022-07-02" "2022-07-03" "2022-07-01" ...
# $ Dailysum : int -4 8 1 2 -6 9 4 8 9
# $ A.Factor : num 0 0 0.1 0 0 0 0 0.5 0
# $ Ind : int 0 0 1 0 0 0 0 1 0
Second, I don't think we need IDISIN
, since I think you're using it as an easy grouping variable, in which case data.table
's use of by=
takes care of it for us.
Third, I'm assuming that you are controlling the order of data (TradingDate
) externally, perhaps setkey(try, ISIN, ID, TradingDate)
or similar. I make no checks (or promises if this is not true.) (Over to you if you want try[, TradingDate := as.Date(TradingDate)]
, it seems logical to do so but changes nothing here.)
From here,
fun <- function(prev, this) {
z <- prev[1] this[1]
c(max(z, 0), max(-z, 0)) / (if (this[2] > 0) this[3] else 1)
}
try[, c("CumulativeSum", "Sell") :=
transpose(Reduce(fun, transpose(list(Dailysum, Ind, A.Factor)),
init = c(0, 0), accumulate = TRUE)[-1]),
by = .(ISIN, ID) ]
try
# ISIN ID TradingDate Dailysum A.Factor Ind CumulativeSum Sell
# <char> <char> <char> <int> <num> <int> <num> <num>
# 1: abc A 2022-07-01 -4 0.0 0 0 4
# 2: abc A 2022-07-02 8 0.0 0 8 0
# 3: ghi A 2022-07-03 1 0.1 1 10 0
# 4: def B 2022-07-01 2 0.0 0 2 0
# 5: def B 2022-07-02 -6 0.0 0 0 4
# 6: def B 2022-07-03 9 0.0 0 9 0
# 7: ghi C 2022-07-01 4 0.0 0 4 0
# 8: ghi C 2022-07-02 8 0.5 1 24 0
# 9: ghi C 2022-07-03 9 0.0 0 33 0
No for
loop required.
Admittedly, the Sell
logic might need double-check to make sure that successive negatives will react as you need it. That should be handled in the fun
ction. In there, the two arguments:
prev
are the previous row's values forCumulativeSum
andSell
, as determined by the previous call tofun
(within a group). The first time it is called for a group, it is preassigned the valuec(0, 0)
(by theinit=
argument).this
is a triplet of the current row'sc(Dailysum, Ind, A.Factor)
(all numbers, not named), so we index it directly by position.