Home > Blockchain >  Grouped diminishing weighted cumsum
Grouped diminishing weighted cumsum

Time:07-17

Suppose I have this data

df <- tibble(
  id = c(rep("ID100", 3), 
             rep("ID200", 3), 
             rep("ID450", 3)), 
  year = 2021, 
  month = c(rep(c(5, 6, 7), 3)), 
  value = 5 
)

# A tibble: 9 × 4
  id     year month value
  <chr> <dbl> <dbl> <dbl>
1 ID100  2021     5     5
2 ID100  2021     6     5
3 ID100  2021     7     5
4 ID200  2021     5     5
5 ID200  2021     6     5
6 ID200  2021     7     5
7 ID450  2021     5     5
8 ID450  2021     6     5
9 ID450  2021     7     5

I would like to mutate a new column with cumsum() of column value, but with weights.

The weights starts from 1 and reduces by its half, as such 1, 0.5, 0.25, 0.125 and so on

Desired output:

# A tibble: 9 × 5
  id     year month value acc_sum
  <chr> <dbl> <dbl> <dbl>   <dbl>
1 ID100  2021     5     5    1.25 # 5 * 0.25
2 ID100  2021     6     5    4    # 5 * 0.5   previous value of 1.25
3 ID100  2021     7     5    9    # 5 * 1   previous value of 4
4 ID200  2021     5     5    1.25
5 ID200  2021     6     5    4   
6 ID200  2021     7     5    9   
7 ID450  2021     5     5    1.25
8 ID450  2021     6     5    4   
9 ID450  2021     7     5    9 

I have tried the following, but I feel like cumsum or accumulate with some modification would do a better job without hardcoding the weights

df %>%
  group_by(id, year) %>%
  arrange(id, year, month) %>%
  mutate(weights = last(value)  
           lag(value) * 0.5  
           lag(value, 2) * 0.25)

CodePudding user response:

A possible solution:

library(tidyverse)

df %>%
  arrange(id, year, month) %>%
  group_by(id, year) %>%
  mutate(acc_sum = cumsum(value * 0.5^(row_number()-1)))

#> # A tibble: 9 × 5
#> # Groups:   id, year [3]
#>   id     year month value acc_sum
#>   <chr> <dbl> <dbl> <dbl>   <dbl>
#> 1 ID100  2021     5     5    5   
#> 2 ID100  2021     6     5    7.5 
#> 3 ID100  2021     7     5    8.75
#> 4 ID200  2021     5     5    5   
#> 5 ID200  2021     6     5    7.5 
#> 6 ID200  2021     7     5    8.75
#> 7 ID450  2021     5     5    5   
#> 8 ID450  2021     6     5    7.5 
#> 9 ID450  2021     7     5    8.75

CodePudding user response:

Maybe you want something like this where you multiply by the (row number)-(row number)*0.75 per group using data.table:

library(tibble)
df <- tibble(
  id = c(rep("ID100", 3), 
         rep("ID200", 3), 
         rep("ID450", 3)), 
  year = 2021, 
  month = c(rep(c(5, 6, 7), 3)), 
  value = 5 
)
library(data.table)
setDT(df)[, acc_sum := cumsum(value*(seq_len(.N)-seq_len(.N)*0.75)), by = "id"]
df
#>       id year month value acc_sum
#> 1: ID100 2021     5     5    1.25
#> 2: ID100 2021     6     5    3.75
#> 3: ID100 2021     7     5    7.50
#> 4: ID200 2021     5     5    1.25
#> 5: ID200 2021     6     5    3.75
#> 6: ID200 2021     7     5    7.50
#> 7: ID450 2021     5     5    1.25
#> 8: ID450 2021     6     5    3.75
#> 9: ID450 2021     7     5    7.50

Created on 2022-07-17 by the reprex package (v2.0.1)

  • Related