Data:
df <- data.frame(year = c(2018, 2019, 2020, 2021),
growth = c(0.05, 0.1, 0.08, 0.06),
size = c(100, NA, NA, NA))
year growth size
1 2018 0.05 100
2 2019 0.10 NA
3 2020 0.08 NA
4 2021 0.06 NA
I have size for year 2018
and growth rates for subsequent years. My goal is to calculate size for each subsequent year as size[i] = size[i-1] * (1 growth[i])
. I can do it with a for loop:
for (i in (2:nrow(df))) {
df$size[i] <- df$size[i-1] * (1 df$growth[i])
}
year growth size
1 2018 0.05 100.000
2 2019 0.10 110.000
3 2020 0.08 118.800
4 2021 0.06 125.928
But I cannot find a dplyr
way of doing the same thing, with mutate
for example. Hoping to hear your ideas. Thanks!
CodePudding user response:
Since the first value of size
is effectively a multiplicative constant for the rest of the column, we can just use the cumprod
(cumulative product) of 1 growth
to get the factor by which to multiply size[1]
to fill the rest of the size
column.
The slight complication is that your algorithm has to ignore the first value of growth
. We can get round this by using a combination of lead
and lag
.
The following therefore works without having to use loops.
library(dplyr)
mutate(df, size = lag(size[1] * cumprod(lead(growth 1)), default = size[1]))
#> year growth size
#> 1 2018 0.05 100.000
#> 2 2019 0.10 110.000
#> 3 2020 0.08 118.800
#> 4 2021 0.06 125.928
CodePudding user response:
A solution with purrr::reduce
:
library(tidyverse)
df <- data.frame(year = c(2018, 2019, 2020, 2021),
growth = c(0.05, 0.1, 0.08, 0.06),
size = c(100, NA, NA, NA))
reduce(2:nrow(df), function(x,y)
{x$size[y] <- x$size[y-1]*(1 x$growth[y]); x}, .init=df)
#> year growth size
#> 1 2018 0.05 100.000
#> 2 2019 0.10 110.000
#> 3 2020 0.08 118.800
#> 4 2021 0.06 125.928