I have a dataset of three variables: identifier, date, and amount. I want to add a fourth column that shows the cumulative amount over time. For example:
identifier | date | amount | cumulative amount |
---|---|---|---|
aaa | 01 | 5 | 5 |
aaa | 02 | 6 | 11 |
aaa | 03 | 8 | 19 |
bbb | 01 | 5 | 5 |
bbb | 02 | 4 | 9 |
bbb | 03 | 0 | 9 |
bbb | 04 | 11 | 20 |
In Excel, I'd use a SUMIF formula to create this fourth column, but I honestly have no clue how to do this in R. Can someone help?
CodePudding user response:
For a dplyr solution you can use mutate to add a column with the cumulative sum for each group.
library(dplyr)
df <- tibble(identifier = c(rep('aaa', 3), rep('bbb', 4)),
date = c(seq(1,3,1),seq(1,4,1)),
amount = c(5,6,8,5,4,0,11))
# A tibble: 7 x 3
identifier date amount
<chr> <dbl> <dbl>
1 aaa 1 5
2 aaa 2 6
3 aaa 3 8
4 bbb 1 5
5 bbb 2 4
6 bbb 3 0
7 bbb 4 11
df %>%
group_by(identifier) %>%
mutate('cumulative amount' = cumsum(amount))
# A tibble: 7 x 4
# Groups: identifier [2]
identifier date amount `cumulative amount`
<chr> <dbl> <dbl> <dbl>
1 aaa 1 5 5
2 aaa 2 6 11
3 aaa 3 8 19
4 bbb 1 5 5
5 bbb 2 4 9
6 bbb 3 0 9
7 bbb 4 11 20