Home > other >  Problem using lag() within a mutate() function (tidyverse)
Problem using lag() within a mutate() function (tidyverse)

Time:10-30

I am trying to add another column to a dataframe where the new column is a function of the previous value in the new column and a current row value. I have tried to strip out irrelevant code and stick in easy numbers so that I might understand answers here. Given the following dataframe:

  x
1 1
2 2
3 3
4 4
5 5

The next column (y) will add 5 to x and also add the previous row's value for y. There's no previous value for y in the first row, so I define it as 0. So the first row value for y would be x 5 0 or 1 5 0 or 6. The second row would be x 5 y(from 1st row) or 2 5 6 or 13. The dataframe should look like this:

  x  y
1 1  6
2 2 13
3 3 21
4 4 30
5 5 40

I tried this with case_when() and lag() functions like this:

test_df <- data.frame(x = 1:5)
test_df %>% mutate(y = case_when(x==1 ~ 6,
                                     x>1 ~ x 5 lag(y)))

Error: Problem with mutate() column y. ℹ y = case_when(x == 1 ~ 6, x > 1 ~ x 5 lag(y)). x object 'y' not found Run rlang::last_error() to see where the error occurred.

I had thought y was defined when the first row was calculated. Is there a better way to do this? Thanks!

CodePudding user response:

You don't need lag here at all. Just a cumsum should suffice.

test_df %>% mutate(y = cumsum(x   5))

#>   x  y
#> 1 1  6
#> 2 2 13
#> 3 3 21
#> 4 4 30
#> 5 5 40

Data

test_df <- data.frame(x = 1:5)

CodePudding user response:

We can also use purrr::accumulate here:

library(purrr)

df %>% mutate(y = accumulate(x 5, ~.x   .y))

  x  y
1 1  6
2 2 13
3 3 21
4 4 30
5 5 40

We can also use accumulate with regular base R synthax:

df %>% mutate(y = accumulate(x 5, function(x, y) {x   y}))
  • Related