Suppose you have a data frame of columns "a" and "b" with the values shown below, generated with df <- data.frame(a=c(0, 1, 2, 2, 3), b=c(1, 3, 8, 9, 4))
. Suppose you want to add a column "c", whereby if a value in "a" equals the value in the immediately preceding row in col "a", then the corresponding row values in col "b" are summed; otherwise a 0 value is shown. A column "c" is added to the below to illustrate what I'm trying to do:
a b add col c
1 0 1 0
2 1 3 0
3 2 8 0
4 2 9 17 (since the values in col "a" rows 3 and 4 are equal, add the values in col b rows 3 and 4)
5 3 4 0
Or in this scenario, whereby cols "a" and "b" are generated by df <- data.frame(a=c(0,1,2,2,2,3), b=c(1,2,3,4,5,6))
:
a b add col c
1 0 1 0
2 1 2 0
3 2 3 0
4 2 4 7 (3 4 from col "b")
5 2 5 9 (4 5 from col "b")
6 3 6 0 (since 2 from prior row <> 3 from current row)
What is the easiest way to do this in native R?
CodePudding user response:
As we are interested in the adjacent values to be equal, use rleid
(from data.table
) to create a grouping index, then create the 'c', by adding the 'b' with lag
of 'b' and replace the default first value of lag
(NA
) to 0
library(dplyr)
library(data.table)
library(tidyr)
df %>%
group_by(grp = rleid(a)) %>%
mutate(c = replace_na(b lag(b), 0)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 6 × 3
a b c
<dbl> <dbl> <dbl>
1 0 1 0
2 1 2 0
3 2 3 0
4 2 4 7
5 2 5 9
6 3 6 0
Or using base R
- a similar approach is with the rle
to create the 'grp', then use ave
to do the addition of previous with current value (by removing the first and last) and then append 0 at the beginning
grp <- with(rle(df$a), rep(seq_along(values), lengths))
df$c <- with(df, ave(b, grp, FUN = function(x) c(0, x[-1] x[-length(x)])))