I have this data frame:
transaction ID | day number | Predicted value |
---|---|---|
12 | 1 | .001 |
12 | 2 | .002 |
12 | 1 | .001 |
12 | 2 | .002 |
13 | 1 | .001 |
13 | 2 | .002 |
13 | 3 | .002 |
13 | 4 | .003 |
I want to take the cumulative sum of the each set of predicted values based on the sequential day numbers (i.e. cumsum of the first 2 rows, cumsum of the next 2, and the cumsum of the last 4)
so the results would be .003, .003, .008
CodePudding user response:
Using R base
sapply(split(df$Predicted_value,cumsum(c(1,diff(df$day_number)!=1))), sum)
1 2 3
0.003 0.003 0.008
CodePudding user response:
Using the answer from this post:
df %>%
group_by(transaction_ID) %>%
mutate(id = cumsum(c(1, diff(day_number) != 1))) %>%
group_by(transaction_ID, id) %>%
summarise(result=sum(Predicted_value))%>%
ungroup
transaction_ID id result
<int> <dbl> <dbl>
1 12 1 0.003
2 12 2 0.003
3 13 1 0.008
CodePudding user response:
Based on your desired output, it's not a cumulative sum but a sum by transaction ID and day group.
Using data.table
dat = data.table(transID = c(12,...),
dayNum = c(1,2,...),
predVal = c(0.001, 0.002, ...))
# introduce a grouping column; each group starts when day == 1
dat[,
gr := cumsum(dayNum == 1)]
# aggregate
dat[,
sum(predVal),
by = gr]
gr V1
1: 1 0.003
2: 2 0.003
3: 3 0.008