Home > Software engineering >  Error applying function in grouped tibble
Error applying function in grouped tibble

Time:12-17

I am trying to create a function that takes a current balance and then works backwards monthly to calculate prior month end balances from each month's additions and losses. The balances need to be calculated within combinations of variables. The output I want looks like this - month_end_balance is what I want the function to output. Everything else I have.

type service month_starting add loss current_balance month_end_balance
A Luxury 12/1/21 2 1 20 20
A Luxury 11/1/21 4 7 NA 19
A Luxury 10/1/21 0 0 NA 22
B Economy 12/1/21 2 8 50 50
B Economy 11/1/21 4 2 NA 56
B Economy 10/1/21 0 0 NA 54

I created the following function, which works on ungrouped data.

running_balance_4 <- function(current_balance, add, loss) {
  out <- rep(NA, length(current_balance))
  out[[1]] <- current_balance[[1]]
  for (i in 2:(length(current_balance))) {
    out[[i]] <-  out[[(i-1)]] - add[[(i-1)]]   loss[[(i-1)]]
  }
  out
}

But I can't get it to apply within each group. It might just be a syntax issue.

df %>%
  group_by(type, service) %>%
  arrange(type, service, desc(month_starting)) %>%
  group_modify(running_balance_4(current_balance, add, loss))

Appreciate any help with the syntax and/or the function itself.

Update: when I attempt to run it I get the following error message: object 'current_balance' not found. So I think there may be a syntax error in addition to any issues with the function.

CodePudding user response:

I see in the comments you got your function to work. However, I thought you might want to see a solution using built in R functions rather than applying a grouped function.

library(tibble)
library(dplyr)

df <- tribble(
  ~type,  ~service, ~month_starting,  ~add, ~loss,  ~current_balance,
  "A", "Luxury",   "12/1/21", 2,  1,  20,
  "A", "Luxury",   "11/1/21", 4,  7,  NA,
  "A", "Luxury",   "10/1/21", 0,  0,  NA,
  "B", "Economy",  "12/1/21", 2,  8,  50,
  "B", "Economy",  "11/1/21", 4,  2,  NA,
  "B", "Economy",  "10/1/21", 0,  0,  NA 
)

df %>% 
  # Two temporary columns to calculate with.
  mutate(
    # Replace current balance with 0 to work with cumulative sum.
    c_balance = coalesce(current_balance, 0),
    # Add the loss and subtract the add since we are working backwards.
    monthly = c_balance   loss - add
  ) %>%
  arrange(type, service, desc(month_starting)) %>%
  group_by(type, service) %>%
  # Taking the lag will put NA on the first element (the rows with current_balance)
  # cumsum is a built in cumulative sum
  mutate(monthly = lag(cumsum(monthly))) %>%
  ungroup() %>%
  mutate(month_end_balance = pmax(current_balance, monthly, na.rm = T))  %>%
  select(-c_balance, -monthly)

It makes a difference based off of the current_balance, does a cumulative sum of that difference using cumsum, and then lines up the observation using lag. The output you were after can then be found by the maximum between current_balance and the lagged variable, since all of the values you don't want are NA.

# A tibble: 6 × 7
  type  service month_starting   add  loss current_balance month_end_balance
  <chr> <chr>   <chr>          <dbl> <dbl>           <dbl>             <dbl>
1 A     Luxury  12/1/21            2     1              20                20
2 A     Luxury  11/1/21            4     7              NA                19
3 A     Luxury  10/1/21            0     0              NA                22
4 B     Economy 12/1/21            2     8              50                50
5 B     Economy 11/1/21            4     2              NA                56
6 B     Economy 10/1/21            0     0              NA                54
  • Related