Home > Enterprise >  How to Calculate a Continuous Cumulative Sum?
How to Calculate a Continuous Cumulative Sum?

Time:10-29

Suppose I have a tibble

id  year    X   
1   2001   False      
1   2002   TRUE       
1   2003   TRUE       
1   2004   False      
1   2005   False      
1   2006   TRUE       
1   2007   TRUE
1   2008   TRUE

How to Calculate a Continuous Cumulative Sum of variable X?

id  year    X     cumN
1   2001   False     0
1   2002   TRUE      1
1   2003   TRUE      2
1   2004   False     0
1   2005   False     0
1   2006   TRUE      1
1   2007   TRUE      2
1   2008   TRUE      3

Thanks!

dt <- tibble(id = rep(1,8),
             year = 2001:2008,
             X = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE,TRUE,TRUE))

CodePudding user response:

There is probably a special dplyr way of doing this, but I think the humble for loop is probably simpler:

dt$cumsum = 0

for (i in 1:nrow(dt)) {
    condition = dt$X[[i]]
    if (condition & i > 1) {
        dt$cumsum[[i]] = dt$cumsum[[i - 1]]   1
    } else {
        dt$cumsum[[i]] = 0
    }
}

> dt
# A tibble: 8 × 4
     id  year X     cumsum
  <dbl> <int> <lgl>  <dbl>
1     1  2001 FALSE      0
2     1  2002 TRUE       1
3     1  2003 TRUE       2
4     1  2004 FALSE      0
5     1  2005 FALSE      0
6     1  2006 TRUE       1
7     1  2007 TRUE       2
8     1  2008 TRUE       3

CodePudding user response:

You can use rle and Map the values with the lengths, replacing all False with 0.

dt$cumN <- unlist(with(dt, Map(\(x,y) replace(seq(x), !y, 0), rle(X)$lengths, rle(X)$values)))
     id  year X      cumN
  <dbl> <int> <lgl> <dbl>
1     1  2001 FALSE     0
2     1  2002 TRUE      1
3     1  2003 TRUE      2
4     1  2004 FALSE     0
5     1  2005 FALSE     0
6     1  2006 TRUE      1
7     1  2007 TRUE      2
8     1  2008 TRUE      3

CodePudding user response:

Two possible ways to solve your problem (one based on a mixed of dplyr and data.table packages and another only based on data.table):

way 1
library(data.table)

setDT(dt)[, cumN := cumsum(X), fifelse(X, y<-rleid(X), y 1L)]

      id  year      X  cumN
   <num> <int> <lgcl> <int>
1:     1  2001  FALSE     0
2:     1  2002   TRUE     1
3:     1  2003   TRUE     2
4:     1  2004  FALSE     0
5:     1  2005  FALSE     0
6:     1  2006   TRUE     1
7:     1  2007   TRUE     2
8:     1  2008   TRUE     3
way 2
library(dplyr)

dt %>%
  group_by(tem = if_else(X, y<-data.table::rleid(X), y)) %>%
  mutate(cumN=cumsum(X)) %>%
  ungroup() %>%
  select(!tem)

# A tibble: 8 x 4
     id  year X      cumN
  <dbl> <int> <lgl> <int>
1     1  2001 FALSE     0
2     1  2002 TRUE      1
3     1  2003 TRUE      2
4     1  2004 FALSE     0
5     1  2005 FALSE     0
6     1  2006 TRUE      1
7     1  2007 TRUE      2
8     1  2008 TRUE      3
  • Related