Replace NA between two values without loop-CodePudding

I have the following data frame:

data <- structure(list(Date = structure(c(-17897, -17896, -17895, -17894, 
                  -17893, -17892, -17891, -17890, -17889, -17888, -17887, -17887, 
                  -17886, -17885, -17884, -17883, -17882, -17881, -17880, -17879, 
                  -17878, -17877, -17876, -17875, -17874, -17873, -17872, -17871, 
                  -17870, -17869, -17868, -17867, -17866, -17865, -17864), class = "Date"), 
                  duration = c(NA, NA, NA, 5, NA, NA, NA, 5, NA, NA, 1, 1, 
                  NA, NA, 3, NA, 3, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                  NA, NA, 4, NA, NA, 4, NA, NA), name = c(NA, NA, NA, "Date_beg", 
                  NA, NA, NA, "Date_end", NA, NA, "Date_beg", "Date_end", NA, 
                  NA, "Date_beg", NA, "Date_end", NA, NA, NA, NA, NA, NA, NA, 
                  NA, NA, NA, NA, NA, "Date_beg", NA, NA, "Date_end", NA, NA
                  )), row.names = c(NA, -35L), class = c("tbl_df", "tbl", "data.frame"
                ))

And looks like:

   Date       duration name    
   <date>        <dbl> <chr>   
 1 1921-01-01       NA NA      
 2 1921-01-02       NA NA      
 3 1921-01-03       NA NA      
 4 1921-01-04        5 Date_beg
 5 1921-01-05       NA NA      
 6 1921-01-06       NA NA      
 7 1921-01-07       NA NA      
 8 1921-01-08        5 Date_end
 9 1921-01-09       NA NA      
10 1921-01-10       NA NA   
...

I want to replace the NA values in column name that are between rows with Date_beg and Date_end with the word "event".

I have tried this:

data %<>% mutate(name = ifelse(((lag(name) == 'Date_beg')|(lag(name) == 'event')) &

But only the first row after Date_beg changes. It is quite easy with a for-loop, but I wanted to use a more R-like method.

CodePudding user response：

There is probably a better way using data.table::nafill, but as you're using tidyverse functions, I would do it by creating an extra event column using tidyr::fill and then pulling it through to the name column where name is NA:

library(tidyr)
data  %>%
    mutate(
    events = ifelse(
        fill(data, name)$name == "Date_beg", 
         "event",
         NA), 
     name = coalesce(name, events)
    )  %>% 
    select(-events)

CodePudding user response：

You can do it by looking at the indices where there have been more "Date_beg" than "Dat_end" with:

data$name[lag(cumsum(data$name == "Date_beg" & !is.na(data$name))) - 
          cumsum(data$name == "Date_end" & !is.na(data$name)) >0] <- "event"

print(data, n=20)
# # A tibble: 35 x 3
# Date       duration name    
# <date>        <dbl> <chr>   
#   1 1921-01-01       NA NA      
# 2 1921-01-02       NA NA      
# 3 1921-01-03       NA NA      
# 4 1921-01-04        5 Date_beg
# 5 1921-01-05       NA event   
# 6 1921-01-06       NA event   
# 7 1921-01-07       NA event   
# 8 1921-01-08        5 Date_end
# 9 1921-01-09       NA NA      
# 10 1921-01-10       NA NA      
# 11 1921-01-11        1 Date_beg
# 12 1921-01-11        1 Date_end
# 13 1921-01-12       NA NA      
# 14 1921-01-13       NA NA      
# 15 1921-01-14        3 Date_beg
# 16 1921-01-15       NA event   
# 17 1921-01-16        3 Date_end
# 18 1921-01-17       NA NA      
# 19 1921-01-18       NA NA      
# 20 1921-01-19       NA NA      
# # ... with 15 more rows

Lagging the first index by one is required so that you don't overwrite the "Date_beg" at the start of each run.

CodePudding user response：

Another dplyr approach using the cumsum function.

If the row in the name column in NA, it'll add 0 to the cumsum, otherwise add 1. Therefore the values under Date_beg will always be odd numbers (0 1) and the values under Date_end will always be even numbers (0 1 1). Then replace values that are odd in the ref column AND not NA in the name column with "event".

library(dplyr)

data %>% 
  mutate(ref = cumsum(ifelse(is.na(name), 0, 1)), 
         name = ifelse(ref %% 2 == 1 & is.na(name), "event", name)) %>% 
  select(-ref)