Home > other >  How to create an r column that depends on previous value of the column being created?
How to create an r column that depends on previous value of the column being created?

Time:03-09

I have a df with two columns id and value:

structure(list(id = c("a", "a", "a", "a", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b"), value = c(8105.292, 
45000, -41065, 45000, 46054.15, 50000, 50000, 50000, -180000, 
50000, 50000, -180000, 50000, 50000, 50000, 50000)), row.names = c(NA, 
-16L), class = "data.frame", sorted = c("x", "y"))

I need to modify the value column and/or create a third column described by the following:

  • if that row is the first occurrence in the table of the id, then return value
  • Otherwise, return the following: lag(value) value

My problem is the correct calculation needs to reference the modified lag(value)... the one that is currently being created. It needs to be applied in sequence, and not reference the original lag(value).

Does this make sense? I have tried applying a function via for loop but am not having luck, and think I am missing something simple. My expected output is:

structure(list(id = c("a", "a", "a", "a", "b", "b", "b", "b", 
"b", "b", "b", "b", "b", "b", "b", "b"), value = c(8105.292, 
45000, -41065, 45000, 46054.15, 50000, 50000, 50000, -180000, 
50000, 50000, -180000, 50000, 50000, 50000, 50000), expectedvalue = c(8105.292, 
53105.292, 12040.292, 57040.292, 46054.15, 96064.15, 146054.15, 
196054.15, 16054.15, 66054.15, 116054.15, -64954.85, -13945.85, 
36054.15, 86054.15, 136054.15)), row.names = c(NA, -16L), sorted = c("x", 
"y"), class = "data.frame")

CodePudding user response:

It sounds like you want the cumulative sum within each id. No need for loops.

library(dplyr)

df %>% group_by(id) %>% mutate(expectedValue = cumsum(value))
#> # A tibble: 16 x 3
#> # Groups:   id [2]
#>    id       value expectedValue
#>    <chr>    <dbl>         <dbl>
#>  1 a        8105.         8105.
#>  2 a       45000         53105.
#>  3 a      -41065         12040.
#>  4 a       45000         57040.
#>  5 b       46054.        46054.
#>  6 b       50000         96054.
#>  7 b       50000        146054.
#>  8 b       50000        196054.
#>  9 b     -180000         16054.
#> 10 b       50000         66054.
#> 11 b       50000        116054.
#> 12 b     -180000        -63946.
#> 13 b       50000        -13946.
#> 14 b       50000         36054.
#> 15 b       50000         86054.
#> 16 b       50000        136054.

Created on 2022-03-08 by the reprex package (v2.0.1)

CodePudding user response:

Is this what you are looking for?

df <- structure(list(id = c("a", "a", "a", "a", "b", "b", "b", "b", 
                      "b", "b", "b", "b", "b", "b", "b", "b"), value = c(8105.292, 
                                                                         45000, -41065, 45000, 46054.15, 50000, 50000, 50000, -180000, 
                                                                         50000, 50000, -180000, 50000, 50000, 50000, 50000)), row.names = c(NA, 
                                                                                                                                            -16L), class = "data.frame", sorted = c("x", "y"))

df %>% group_by(id) %>% mutate(value_ = value   lag(value)) %>% mutate(value2 = ifelse(is.na(value_), value, value_)) %>% select(id, value, value2)


   id       value   value2
   <chr>    <dbl>    <dbl>
 1 a        8105.    8105.
 2 a       45000    53105.
 3 a      -41065     3935 
 4 a       45000     3935 
 5 b       46054.   46054.
 6 b       50000    96054.
 7 b       50000   100000 
 8 b       50000   100000 
 9 b     -180000  -130000 
10 b       50000  -130000 
11 b       50000   100000 
12 b     -180000  -130000 
13 b       50000  -130000 
14 b       50000   100000 
15 b       50000   100000 
16 b       50000   100000 

If you want to sort it before the group_by, function..

df %>% arrange(id, value) %>% group_by(id) %>% mutate(value_ = value   lag(value)) %>% mutate(value2 = ifelse(is.na(value_), value, value_)) %>% select(id, value, value2)


   id       value   value2
   <chr>    <dbl>    <dbl>
 1 a      -41065   -41065 
 2 a        8105.  -32960.
 3 a       45000    53105.
 4 a       45000    90000 
 5 b     -180000  -180000 
 6 b     -180000  -360000 
 7 b       46054. -133946.
 8 b       50000    96054.
 9 b       50000   100000 
10 b       50000   100000 
11 b       50000   100000 
12 b       50000   100000 
13 b       50000   100000 
14 b       50000   100000 
15 b       50000   100000 
16 b       50000   100000 
  •  Tags:  
  • r
  • Related