Suppose I have a tibble
id year X
1 2001 False
1 2002 TRUE
1 2003 TRUE
1 2004 False
1 2005 False
1 2006 TRUE
1 2007 TRUE
1 2008 TRUE
How to Calculate a Continuous Cumulative Sum of variable X?
id year X cumN
1 2001 False 0
1 2002 TRUE 1
1 2003 TRUE 2
1 2004 False 0
1 2005 False 0
1 2006 TRUE 1
1 2007 TRUE 2
1 2008 TRUE 3
Thanks!
dt <- tibble(id = rep(1,8),
year = 2001:2008,
X = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE,TRUE,TRUE))
CodePudding user response:
There is probably a special dplyr way of doing this, but I think the humble for loop is probably simpler:
dt$cumsum = 0
for (i in 1:nrow(dt)) {
condition = dt$X[[i]]
if (condition & i > 1) {
dt$cumsum[[i]] = dt$cumsum[[i - 1]] 1
} else {
dt$cumsum[[i]] = 0
}
}
> dt
# A tibble: 8 × 4
id year X cumsum
<dbl> <int> <lgl> <dbl>
1 1 2001 FALSE 0
2 1 2002 TRUE 1
3 1 2003 TRUE 2
4 1 2004 FALSE 0
5 1 2005 FALSE 0
6 1 2006 TRUE 1
7 1 2007 TRUE 2
8 1 2008 TRUE 3
CodePudding user response:
You can use rle
and Map
the values with the lengths, replacing all False with 0.
dt$cumN <- unlist(with(dt, Map(\(x,y) replace(seq(x), !y, 0), rle(X)$lengths, rle(X)$values)))
id year X cumN
<dbl> <int> <lgl> <dbl>
1 1 2001 FALSE 0
2 1 2002 TRUE 1
3 1 2003 TRUE 2
4 1 2004 FALSE 0
5 1 2005 FALSE 0
6 1 2006 TRUE 1
7 1 2007 TRUE 2
8 1 2008 TRUE 3
CodePudding user response:
Two possible ways to solve your problem (one based on a mixed of dplyr
and data.table
packages and another only based on data.table
):
way 1
library(data.table)
setDT(dt)[, cumN := cumsum(X), fifelse(X, y<-rleid(X), y 1L)]
id year X cumN
<num> <int> <lgcl> <int>
1: 1 2001 FALSE 0
2: 1 2002 TRUE 1
3: 1 2003 TRUE 2
4: 1 2004 FALSE 0
5: 1 2005 FALSE 0
6: 1 2006 TRUE 1
7: 1 2007 TRUE 2
8: 1 2008 TRUE 3
way 2
library(dplyr)
dt %>%
group_by(tem = if_else(X, y<-data.table::rleid(X), y)) %>%
mutate(cumN=cumsum(X)) %>%
ungroup() %>%
select(!tem)
# A tibble: 8 x 4
id year X cumN
<dbl> <int> <lgl> <int>
1 1 2001 FALSE 0
2 1 2002 TRUE 1
3 1 2003 TRUE 2
4 1 2004 FALSE 0
5 1 2005 FALSE 0
6 1 2006 TRUE 1
7 1 2007 TRUE 2
8 1 2008 TRUE 3