In the MWE code at the bottom, I'm trying to generate a running balance for each unique id when running from one row to the next. For example, when running the below code the output should be:
data2 <-
id plusA plusB minusC running_balance [desired calculation for running balance]
1 3 5 10 -2 3 5 - 10 = -2
2 4 5 9 0 4 5 - 9 = 0
3 8 5 8 5 8 5 - 8 = 5
3 1 4 7 3 id doesn't change so 5 from above (1 4 - 7) = 3
3 2 5 6 4 id doesn't change so 3 from above (2 5 - 6) = 4
5 3 6 5 4 3 6 - 5 = 4
The below MWE refers to, when id is consistent from one row to the next, the prior row plusA
amount rather than the prior row running_balance
amount. I've tried changing the below to some form of lag(running_balance...)
without luck yet.
I'm trying to minimize the use of too many packages. For example I understand the purrr
package offers an accumulate()
function, but I'd rather stick to only dplyr
for now. Is there a simple way to do this, using dplyr mutate()
in my case? I also tried fiddling around with the dplyr cumsum()
function which should work here but I'm unsure of how to string several of them together.
MWE code:
data <- data.frame(id=c(1,2,3,3,3,5),
plusA=c(3,4,8,1,2,3),
plusB=c(5,5,5,4,5,6),
minusC = c(10,9,8,7,6,5))
library(dplyr)
data2<- subset(
data %>% mutate(extra=case_when(id==lag(id) ~ lag(plusA), TRUE ~ 0)) %>%
mutate(running_balance=plusA plusB-minusC extra),
select = -c(extra)
)
CodePudding user response:
Using dplyr
:
data %>%
mutate(running_balance = plusA plusB - minusC) %>%
group_by(id) %>%
mutate(running_balance = cumsum(running_balance)) %>%
ungroup()
Output:
# A tibble: 6 x 5
# Groups: id [4]
id plusA plusB minusC running_balance
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 3 5 10 -2
2 2 4 5 9 0
3 3 8 5 8 5
4 3 1 4 7 3
5 3 2 5 6 4
6 5 3 6 5 4