Home > Software engineering >  How to determine number of sequential true values in a data frame in R using tidyverse/dplyr?
How to determine number of sequential true values in a data frame in R using tidyverse/dplyr?

Time:10-07

I have a data frame and would like to add a column with the number of successive TRUE values using tidyverse.

For example, the following data frame:

dftmp <- data.frame(order = 1:10,
                    true = c(FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE))

which looks like

   order  true
1      1 FALSE
2      2  TRUE
3      3  TRUE
4      4  TRUE
5      5 FALSE
6      6 FALSE
7      7  TRUE
8      8 FALSE
9      9  TRUE
10    10  TRUE

and would like it to look like

   order  true count
1      1 FALSE     0
2      2  TRUE     1
3      3  TRUE     2
4      4  TRUE     3
5      5 FALSE     0
6      6 FALSE     0
7      7  TRUE     1
8      8 FALSE     0
9      9  TRUE     1
10    10  TRUE     2

I can figure out how to do it in a loop (below), but not sure if there is a tidyverse equivalent (I suspect it will be with dplyr, but not sure)?

dftmp$count <- NA

for (lpVar in 1:nrow(dftmp)) {
  dftmp$count[lpVar] <- ifelse(test = dftmp$true[lpVar],
                               yes = dftmp$count[lpVar - 1]   1,
                               no = 0)
}

Does anyone have any ideas?

CodePudding user response:

The package hutilscpp has a convenient cumsum_reset function:

library(dplyr)
library(hutilscpp)

dftmp %>%
  mutate(count = cumsum_reset(true))

   order  true count
1      1 FALSE     0
2      2  TRUE     1
3      3  TRUE     2
4      4  TRUE     3
5      5 FALSE     0
6      6 FALSE     0
7      7  TRUE     1
8      8 FALSE     0
9      9  TRUE     1
10    10  TRUE     2

Or with dplyr:

dftmp %>% 
  group_by(grp = cumsum(!true)) %>% 
  mutate(cum_sum = cumsum(true)) %>% 
  ungroup() %>% 
  select(-grp)

Credits for the second solution:

Cumsum reset at certain values

  • Related