Home > Software engineering >  How to add new column in R data frame showing sum of a value in a current row and a prior row, if ce
How to add new column in R data frame showing sum of a value in a current row and a prior row, if ce

Time:12-11

Suppose you have a data frame of columns "a" and "b" with the values shown below, generated with df <- data.frame(a=c(0, 1, 2, 2, 3), b=c(1, 3, 8, 9, 4)). Suppose you want to add a column "c", whereby if a value in "a" equals the value in the immediately preceding row in col "a", then the corresponding row values in col "b" are summed; otherwise a 0 value is shown. A column "c" is added to the below to illustrate what I'm trying to do:

   a  b   add col c
1  0  1       0
2  1  3       0
3  2  8       0
4  2  9       17 (since the values in col "a" rows 3 and 4 are equal, add the values in col b rows 3 and 4)
5  3  4       0

Or in this scenario, whereby cols "a" and "b" are generated by df <- data.frame(a=c(0,1,2,2,2,3), b=c(1,2,3,4,5,6)):

   a  b    add col c
1  0  1        0
2  1  2        0
3  2  3        0
4  2  4        7 (3 4 from col "b")
5  2  5        9 (4 5 from col "b")
6  3  6        0 (since 2 from prior row <> 3 from current row)

What is the easiest way to do this in native R?

CodePudding user response:

As we are interested in the adjacent values to be equal, use rleid (from data.table) to create a grouping index, then create the 'c', by adding the 'b' with lag of 'b' and replace the default first value of lag (NA) to 0

library(dplyr)
library(data.table)
library(tidyr)
df %>% 
   group_by(grp = rleid(a)) %>% 
   mutate(c = replace_na(b   lag(b), 0)) %>%
   ungroup %>%
   select(-grp)

-output

# A tibble: 6 × 3
      a     b     c
  <dbl> <dbl> <dbl>
1     0     1     0
2     1     2     0
3     2     3     0
4     2     4     7
5     2     5     9
6     3     6     0

Or using base R - a similar approach is with the rle to create the 'grp', then use ave to do the addition of previous with current value (by removing the first and last) and then append 0 at the beginning

grp <- with(rle(df$a), rep(seq_along(values), lengths))
df$c <- with(df, ave(b, grp, FUN = function(x) c(0, x[-1]   x[-length(x)])))
  • Related