I have an indicator column cond
. I want to create a new column that equals -1
if anytime before cond == T
, 0
if cond == T
, and counts the number of rows until cond
is true again.
What I am looking for is something conceptually similar to the cumsum(is.na(tmp))
that restarts after every 0.
library(dplyr)
tibble(cond = c(F, F, T, T, F, T, F, F, F, F, T)) %>%
mutate(tmp = case_when(cumsum(cond) == 0 ~ -1,
cond ~ 0,
T ~ NA_real_))
#> # A tibble: 11 × 2
#> cond tmp
#> <lgl> <dbl>
#> 1 FALSE -1
#> 2 FALSE -1
#> 3 TRUE 0
#> 4 TRUE 0
#> 5 FALSE NA
#> 6 TRUE 0
#> 7 FALSE NA
#> 8 FALSE NA
#> 9 FALSE NA
#> 10 FALSE NA
#> 11 TRUE 0
Created on 2021-11-15 by the reprex package (v0.3.0)
Desired output:
#> # A tibble: 11 × 2
#> cond tmp
#> <lgl> <dbl>
#> 1 FALSE -1
#> 2 FALSE -1
#> 3 TRUE 0
#> 4 TRUE 0
#> 5 FALSE 1
#> 6 TRUE 0
#> 7 FALSE 1
#> 8 FALSE 2
#> 9 FALSE 3
#> 10 FALSE 4
#> 11 TRUE 0
CodePudding user response:
We may use rleid
to create a grouping column and then replace the 'tmp' if
all
the values are NA
with the sequence (row_number()
)
library(dplyr)
library(data.table)
dat <- dat %>%
group_by(grp = rleid(is.na(tmp))) %>%
mutate(tmp = if(all(is.na(tmp))) row_number() else tmp) %>%
ungroup %>%
select(-grp)
-output
dat
# A tibble: 11 × 2
cond tmp
<lgl> <dbl>
1 FALSE -1
2 FALSE -1
3 TRUE 0
4 TRUE 0
5 FALSE 1
6 TRUE 0
7 FALSE 1
8 FALSE 2
9 FALSE 3
10 FALSE 4
11 TRUE 0
data
dat <- structure(list(cond = c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE,
FALSE, FALSE, FALSE, FALSE, TRUE), tmp = c(-1, -1, 0, 0, NA,
0, NA, NA, NA, NA, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -11L))