Home > Mobile >  In using dplyr in R, how to merge 2 separate mutate statements operating on the same object?
In using dplyr in R, how to merge 2 separate mutate statements operating on the same object?

Time:12-22

The below MWE code works as intended. In summary:

  • the first data1 <- ...mutate(...) adds a new column "minusD" calculated as (i) the current row "plusB" value (ii) prior row "PlusB" value if the id is the same when moving from one row to the next row (otherwise 0), and
  • the second data1 <- ...mutate(...) adds a "running_balance" column that calculates a cumsum() for all rows sharing the same id.

However, when deploying this in the more complete code this is meant for, I get an error when running another table that draws from the equivalent of this "data1" data frame, due to running two data1 <- ... processes. So, how do I collapse these 2 functions into one?

Output with calculations explained:

     id plusA plusB minusC minusD running_balance [explain calculations ...]
     1     3     5     10      5              -7   minus D = plusB, running bal = plusA   plusB - minusC - minusD
     2     4     5      9      5              -5   same formulas as above since id <> prior row id
     3     8     5      8      5               0   same formulas as above since id <> prior row id
     3     1     4      7      9             -11   since id = prior row id, minus D = plusB   prior row plus B, and running bal = running bal from prior row   plusA   plusB - minusC - minusD  
     3     2     5      6      9             -19   same formulas as above since id = prior row id
     5     3     6      5      6              -2   minus D = plusB, running bal = plusA   plusB - minusC - minusD

MWE code:

data <- data.frame(id=c(1,2,3,3,3,5), 
                   plusA=c(3,4,8,1,2,3), 
                   plusB=c(5,5,5,4,5,6),
                   minusC = c(10,9,8,7,6,5))

library(dplyr)

data1<- subset(
  data %>% mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0)) %>%
    mutate(minusD=plusB extra),
  select = -c(extra) # remove temporary calculation column 
) 

data1 <- data1 %>% group_by(id) %>% mutate(running_balance = cumsum(plusA   plusB - minusC - minusD))

CodePudding user response:

You may continue the chain with %>% instead of creating a temporary object.

library(dplyr)

data %>% 
  mutate(extra=case_when(id==lag(id)~lag(plusB),TRUE ~ 0),
         minusD=plusB extra) %>%
  group_by(id) %>%
  mutate(running_balance = cumsum(plusA   plusB - minusC - minusD)) %>%
  ungroup %>%
  select(-extra)

#     id plusA plusB minusC minusD running_balance
#  <dbl> <dbl> <dbl>  <dbl>  <dbl>           <dbl>
#1     1     3     5     10      5              -7
#2     2     4     5      9      5              -5
#3     3     8     5      8      5               0
#4     3     1     4      7      9             -11
#5     3     2     5      6      9             -19
#6     5     3     6      5      6              -2
  • Related