Home > Back-end >  Running count of the number of missing values with reset?
Running count of the number of missing values with reset?

Time:11-16

I have an indicator column cond. I want to create a new column that equals -1 if anytime before cond == T, 0 if cond == T, and counts the number of rows until cond is true again.

What I am looking for is something conceptually similar to the cumsum(is.na(tmp)) that restarts after every 0.

library(dplyr)

tibble(cond = c(F, F, T, T, F, T, F, F, F, F, T)) %>% 
  mutate(tmp = case_when(cumsum(cond) == 0 ~ -1,
                         cond ~ 0,
                         T ~ NA_real_))
#> # A tibble: 11 × 2
#>    cond    tmp
#>    <lgl> <dbl>
#>  1 FALSE    -1
#>  2 FALSE    -1
#>  3 TRUE      0
#>  4 TRUE      0
#>  5 FALSE    NA
#>  6 TRUE      0
#>  7 FALSE    NA
#>  8 FALSE    NA
#>  9 FALSE    NA
#> 10 FALSE    NA
#> 11 TRUE      0

Created on 2021-11-15 by the reprex package (v0.3.0)

Desired output:

#> # A tibble: 11 × 2
#>    cond    tmp
#>    <lgl> <dbl>
#>  1 FALSE    -1
#>  2 FALSE    -1
#>  3 TRUE      0
#>  4 TRUE      0
#>  5 FALSE     1
#>  6 TRUE      0
#>  7 FALSE     1
#>  8 FALSE     2
#>  9 FALSE     3
#> 10 FALSE     4
#> 11 TRUE      0

CodePudding user response:

We may use rleid to create a grouping column and then replace the 'tmp' if all the values are NA with the sequence (row_number())

library(dplyr)
library(data.table)
dat <- dat %>% 
    group_by(grp = rleid(is.na(tmp))) %>% 
    mutate(tmp = if(all(is.na(tmp))) row_number() else tmp) %>%
    ungroup %>%
    select(-grp)

-output

dat
# A tibble: 11 × 2
   cond    tmp
   <lgl> <dbl>
 1 FALSE    -1
 2 FALSE    -1
 3 TRUE      0
 4 TRUE      0
 5 FALSE     1
 6 TRUE      0
 7 FALSE     1
 8 FALSE     2
 9 FALSE     3
10 FALSE     4
11 TRUE      0

data

dat <- structure(list(cond = c(FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, 
FALSE, FALSE, FALSE, FALSE, TRUE), tmp = c(-1, -1, 0, 0, NA, 
0, NA, NA, NA, NA, 0)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -11L))
  • Related