Home > Enterprise >  Fill the value forward until some condition satisfied
Fill the value forward until some condition satisfied

Time:07-29

      id cate  result
 1     1 yes       1
 2     1 yes      NA
 3     1 no       NA
 4     2 no       NA
 5     2 yes       1
 6     2 yes      NA
 7     2 no       NA
 8     3 no       NA
 9     3 yes      NA
10     3 no       NA
11     3 yes       1
12     3 yes      NA
13     3 no       NA
14     3 yes      NA
15     4 yes       1
16     4 yes      NA
17     4 yes      NA
18     4 no       NA
19     4 no       NA 

I want to fill the value of the result (1) forward up to the first no of the cate variable.

I try it using the code

library(tidyverse)
d2 <- d %>% group_by(id) %>% 
  fill(result, .direction = 'down') 

But the desired output is

      id cate  result
 1     1 yes       1
 2     1 yes       1
 3     1 no        1
 4     2 no       NA
 5     2 yes       1
 6     2 yes       1
 7     2 no        1
 8     3 no       NA
 9     3 yes      NA
10     3 no       NA
11     3 yes       1
12     3 yes       1
13     3 no        1
14     3 yes      NA
15     4 yes       1
16     4 yes       1
17     4 yes       1
18     4 no        1
19     4 no       NA     

CodePudding user response:

My solution uses a cumulative sum and lead-lag logic. The first clause before the central | covers everything except the NA in row 14, and the second clause fills this gap.

library(tidyverse)

df <- structure(list(id = c(1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 
3, 4, 4, 4, 4, 4), cate = c("yes", "yes", "no", "no", "yes", 
"yes", "no", "no", "yes", "no", "yes", "yes", "no", "yes", "yes", 
"yes", "yes", "no", "no"), result = c(1, NA, NA, NA, 1, NA, NA, 
NA, NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA)), row.names = c(NA, 
-19L), class = c("tbl_df", "tbl", "data.frame")) 

df %>%
  group_by(key_no = cumsum((cate == "no" & lag(cate) == "no") |
                             (cate == "yes" & row_number() > 2 & lag(cate) == "no"& lag(cate, n = 2) == "yes"))) %>%
  fill(result, .direction = "down") %>%
  ungroup() %>%
  select(-key_no)

The output:


# # A tibble: 19 × 3
#       id cate  result
#    <dbl> <chr>  <dbl>
#  1     1 yes        1
#  2     1 yes        1
#  3     1 no         1
#  4     2 no        NA
#  5     2 yes        1
#  6     2 yes        1
#  7     2 no         1
#  8     3 no        NA
#  9     3 yes       NA
# 10     3 no        NA
# 11     3 yes        1
# 12     3 yes        1
# 13     3 no         1
# 14     3 yes       NA
# 15     4 yes        1
# 16     4 yes        1
# 17     4 yes        1
# 18     4 no         1
# 19     4 no        NA

CodePudding user response:

A not very dplyr-y solution, but it seems to work. I'm match-ing the first 1, then matching the first "no" after a 1, then replacing the values in result within this range with a 1:

dat %>%
    group_by(id) %>%
    mutate(
    result = {
        ones = match(1, result)
        nos  = match("no", tail(cate, -ones))   ones
        replace(result, seq(ones,nos), 1)
    }    
    )

## A tibble: 19 x 3
## Groups:   id [4]
#      id cate  result
#   <int> <chr>  <dbl>
# 1     1 yes        1
# 2     1 yes        1
# 3     1 no         1
# 4     2 no        NA
# 5     2 yes        1
# 6     2 yes        1
# 7     2 no         1
# 8     3 no        NA
# 9     3 yes       NA
#10     3 no        NA
#11     3 yes        1
#12     3 yes        1
#13     3 no         1
#14     3 yes       NA
#15     4 yes        1
#16     4 yes        1
#17     4 yes        1
#18     4 no         1
#19     4 no        NA
  • Related