Home > Software engineering >  Count prior occurences of value one greater than current value
Count prior occurences of value one greater than current value

Time:10-25

I am trying to create a column that contains the number of prior occurrences (sorted by date) of a value one greater than the current value. In the example provided here, I manually create the values I want in the column labeled “wanted”, which is equal to the count of the prior occurrences (sorted by "date") of RoundNo that are equal to one greater than the focal row RoundNo. And I need this to be computed separately by group for each individual InvestorID.

So the first row "wanted" value is equal to the count of prior RoundNo of Investor 1 where RoundNo == 3 (aka one larger than the first row's RoundNo of 2). So in this case that would be 0. Similarly for the second row, the "wanted" value is the count of prior RoundNo of Investor 1 where RoundNo == 2 (aka one larger than the second row's RoundNo of 1). So in this case that would be 1. Would appreciate any help. Code example is below. Thanks!

dt = as.data.table(cbind(c(rep(1,7),rep(2,7)),
                         c("2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01","2021-04-01","2021-10-01",
                           "2019-01-01","2019-02-01","2019-04-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01"),
                         c(2,1,2,2,1,3,2,1,2,3,2,1,3,1)))
names(dt) = c("InvestorID","date","RoundNo")
wanted = c(0,1,0,0,3,0,1,0,0,0,1,2,0,2)
dt$wanted = wanted

CodePudding user response:

1) Define a function Count which counts the number of times each element of its vector input equals one plus its last element. Then use rollapplyr to apply that to successively larger leading sequences of RoundNo.

library(zoo)

Count <- function(x) sum(x == tail(x, 1)   1)
dt[, wanted := rollapplyr(as.numeric(RoundNo), 1:.N, Count), by = InvestorID]

2) An alternate method is to use a self left join in which the first instance of dt aliased to a is left joined to the second instance of dt aliased to b associating those b rows which are from the same InvestorID and come before or at the a row. Group by a row and take the appropriate sum over the b rows.

library(sqldf)

sqldf("select a.*, sum(a.RoundNo   1 == b.RoundNo) wanted
  from dt a
  left join dt b on a.InvestorID = b.InvestorID and b.rowid <= a.rowid
  group by a.rowid")

3) This alternative only uses data.table. Count is from (1).

dt[, wanted := sapply(1:.N, function(i) Count(as.numeric(RoundNo)[1:i])), 
     by = InvestorID]

CodePudding user response:

Another data.table solution using Reduce:

dt[order(date),.(date,
                 result=lapply(Reduce('c',as.numeric(RoundNo),accumulate=T),
                                    function(x) sum(x==last(x) 1)),
                 wanted), by=InvestorID]

    InvestorID       date result wanted
 1:          2 2019-01-01      0      0
 2:          2 2019-02-01      0      0
 3:          2 2019-04-01      0      0
 4:          2 2019-08-01      1      1
 5:          2 2019-09-01      2      2
 6:          2 2019-10-01      0      0
 7:          2 2019-11-01      2      2
 8:          1 2019-08-01      0      0
 9:          1 2019-09-01      1      1
10:          1 2019-10-01      0      0
11:          1 2019-11-01      0      0
12:          1 2019-12-01      3      3
13:          1 2021-04-01      0      0
14:          1 2021-10-01      1      1
  • Related