The below MWE code works as intended. In summary:
- the first
data1 <- ...mutate(...)
adds a new column "minusD" calculated as (i) the current row "plusB" value (ii) prior row "PlusB" value if the id is the same when moving from one row to the next row (otherwise 0), and - the second
data1 <- ...mutate(...)
adds a "running_balance" column that calculates acumsum()
for all rows sharing the same id.
However, when deploying this in the more complete code this is meant for, I get an error when running another table that draws from the equivalent of this "data1" data frame, due to running two data1 <- ...
processes. So, how do I collapse these 2 functions into one?
Output with calculations explained:
id plusA plusB minusC minusD running_balance [explain calculations ...]
1 3 5 10 5 -7 minus D = plusB, running bal = plusA plusB - minusC - minusD
2 4 5 9 5 -5 same formulas as above since id <> prior row id
3 8 5 8 5 0 same formulas as above since id <> prior row id
3 1 4 7 9 -11 since id = prior row id, minus D = plusB prior row plus B, and running bal = running bal from prior row plusA plusB - minusC - minusD
3 2 5 6 9 -19 same formulas as above since id = prior row id
5 3 6 5 6 -2 minus D = plusB, running bal = plusA plusB - minusC - minusD
MWE code:
data <- data.frame(id=c(1,2,3,3,3,5),
plusA=c(3,4,8,1,2,3),
plusB=c(5,5,5,4,5,6),
minusC = c(10,9,8,7,6,5))
library(dplyr)
data1<- subset(
data %>% mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0)) %>%
mutate(minusD=plusB extra),
select = -c(extra) # remove temporary calculation column
)
data1 <- data1 %>% group_by(id) %>% mutate(running_balance = cumsum(plusA plusB - minusC - minusD))
CodePudding user response:
You may continue the chain with %>%
instead of creating a temporary object.
library(dplyr)
data %>%
mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0),
minusD=plusB extra) %>%
group_by(id) %>%
mutate(running_balance = cumsum(plusA plusB - minusC - minusD)) %>%
ungroup %>%
select(-extra)
# id plusA plusB minusC minusD running_balance
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 3 5 10 5 -7
#2 2 4 5 9 5 -5
#3 3 8 5 8 5 0
#4 3 1 4 7 9 -11
#5 3 2 5 6 9 -19
#6 5 3 6 5 6 -2