I am trying to create a column that contains the number of prior occurrences (sorted by date) of a value one greater than the current value. In the example provided here, I manually create the values I want in the column labeled “wanted”, which is equal to the count of the prior occurrences (sorted by "date") of RoundNo that are equal to one greater than the focal row RoundNo. And I need this to be computed separately by group for each individual InvestorID.
So the first row "wanted" value is equal to the count of prior RoundNo of Investor 1 where RoundNo == 3 (aka one larger than the first row's RoundNo of 2). So in this case that would be 0. Similarly for the second row, the "wanted" value is the count of prior RoundNo of Investor 1 where RoundNo == 2 (aka one larger than the second row's RoundNo of 1). So in this case that would be 1. Would appreciate any help. Code example is below. Thanks!
dt = as.data.table(cbind(c(rep(1,7),rep(2,7)),
c("2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01","2021-04-01","2021-10-01",
"2019-01-01","2019-02-01","2019-04-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01"),
c(2,1,2,2,1,3,2,1,2,3,2,1,3,1)))
names(dt) = c("InvestorID","date","RoundNo")
wanted = c(0,1,0,0,3,0,1,0,0,0,1,2,0,2)
dt$wanted = wanted
CodePudding user response:
1) Define a function Count
which counts the number of times each element of its vector input equals one plus its last element. Then use rollapplyr to apply that to successively larger leading sequences of RoundNo
.
library(zoo)
Count <- function(x) sum(x == tail(x, 1) 1)
dt[, wanted := rollapplyr(as.numeric(RoundNo), 1:.N, Count), by = InvestorID]
2) An alternate method is to use a self left join in which the first instance of dt
aliased to a
is left joined to the second instance of dt
aliased to b
associating those b
rows which are from the same InvestorID
and come before or at the a
row. Group by a
row and take the appropriate sum over the b
rows.
library(sqldf)
sqldf("select a.*, sum(a.RoundNo 1 == b.RoundNo) wanted
from dt a
left join dt b on a.InvestorID = b.InvestorID and b.rowid <= a.rowid
group by a.rowid")
3) This alternative only uses data.table. Count
is from (1).
dt[, wanted := sapply(1:.N, function(i) Count(as.numeric(RoundNo)[1:i])),
by = InvestorID]
CodePudding user response:
Another data.table
solution using Reduce
:
dt[order(date),.(date,
result=lapply(Reduce('c',as.numeric(RoundNo),accumulate=T),
function(x) sum(x==last(x) 1)),
wanted), by=InvestorID]
InvestorID date result wanted
1: 2 2019-01-01 0 0
2: 2 2019-02-01 0 0
3: 2 2019-04-01 0 0
4: 2 2019-08-01 1 1
5: 2 2019-09-01 2 2
6: 2 2019-10-01 0 0
7: 2 2019-11-01 2 2
8: 1 2019-08-01 0 0
9: 1 2019-09-01 1 1
10: 1 2019-10-01 0 0
11: 1 2019-11-01 0 0
12: 1 2019-12-01 3 3
13: 1 2021-04-01 0 0
14: 1 2021-10-01 1 1