Suppose I have this data
df <- tibble(
id = c(rep("ID100", 3),
rep("ID200", 3),
rep("ID450", 3)),
year = 2021,
month = c(rep(c(5, 6, 7), 3)),
value = 5
)
# A tibble: 9 × 4
id year month value
<chr> <dbl> <dbl> <dbl>
1 ID100 2021 5 5
2 ID100 2021 6 5
3 ID100 2021 7 5
4 ID200 2021 5 5
5 ID200 2021 6 5
6 ID200 2021 7 5
7 ID450 2021 5 5
8 ID450 2021 6 5
9 ID450 2021 7 5
I would like to mutate a new column with cumsum()
of column value, but with weights.
The weights starts from 1 and reduces by its half, as such
1, 0.5, 0.25, 0.125
and so on
Desired output:
# A tibble: 9 × 5
id year month value acc_sum
<chr> <dbl> <dbl> <dbl> <dbl>
1 ID100 2021 5 5 1.25 # 5 * 0.25
2 ID100 2021 6 5 4 # 5 * 0.5 previous value of 1.25
3 ID100 2021 7 5 9 # 5 * 1 previous value of 4
4 ID200 2021 5 5 1.25
5 ID200 2021 6 5 4
6 ID200 2021 7 5 9
7 ID450 2021 5 5 1.25
8 ID450 2021 6 5 4
9 ID450 2021 7 5 9
I have tried the following, but I feel like cumsum
or accumulate
with some modification would do a better job without hardcoding the weights
df %>%
group_by(id, year) %>%
arrange(id, year, month) %>%
mutate(weights = last(value)
lag(value) * 0.5
lag(value, 2) * 0.25)
CodePudding user response:
A possible solution:
library(tidyverse)
df %>%
arrange(id, year, month) %>%
group_by(id, year) %>%
mutate(acc_sum = cumsum(value * 0.5^(row_number()-1)))
#> # A tibble: 9 × 5
#> # Groups: id, year [3]
#> id year month value acc_sum
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 ID100 2021 5 5 5
#> 2 ID100 2021 6 5 7.5
#> 3 ID100 2021 7 5 8.75
#> 4 ID200 2021 5 5 5
#> 5 ID200 2021 6 5 7.5
#> 6 ID200 2021 7 5 8.75
#> 7 ID450 2021 5 5 5
#> 8 ID450 2021 6 5 7.5
#> 9 ID450 2021 7 5 8.75
CodePudding user response:
Maybe you want something like this where you multiply by the (row number)-(row number)*0.75 per group using data.table
:
library(tibble)
df <- tibble(
id = c(rep("ID100", 3),
rep("ID200", 3),
rep("ID450", 3)),
year = 2021,
month = c(rep(c(5, 6, 7), 3)),
value = 5
)
library(data.table)
setDT(df)[, acc_sum := cumsum(value*(seq_len(.N)-seq_len(.N)*0.75)), by = "id"]
df
#> id year month value acc_sum
#> 1: ID100 2021 5 5 1.25
#> 2: ID100 2021 6 5 3.75
#> 3: ID100 2021 7 5 7.50
#> 4: ID200 2021 5 5 1.25
#> 5: ID200 2021 6 5 3.75
#> 6: ID200 2021 7 5 7.50
#> 7: ID450 2021 5 5 1.25
#> 8: ID450 2021 6 5 3.75
#> 9: ID450 2021 7 5 7.50
Created on 2022-07-17 by the reprex package (v2.0.1)