Home > Mobile >  Fill NA until certain date based on different column per group
Fill NA until certain date based on different column per group

Time:01-25

I have the following dataframe df (dput below):

   group      date1      date2 value
1      A 2022-01-01 2022-01-07    NA
2      A 2022-01-02 2022-01-07     1
3      A 2022-01-04 2022-01-07    NA
4      A 2022-01-10 2022-01-07    NA
5      B 2022-01-01 2022-01-06     3
6      B 2022-01-03 2022-01-06    NA
7      B 2022-01-04 2022-01-06    NA
8      B 2022-01-06 2022-01-06    NA
9      C 2022-01-01 2022-01-09    NA
10     C 2022-01-03 2022-01-09    NA
11     C 2022-01-04 2022-01-09     2
12     C 2022-01-11 2022-01-09    NA

I would like to fill the NA values per group until date2. So all NA values after the first row with a value (group A with value 1) should be filled until date2 if date1 is less. This is the desired output:

   group      date1      date2 value
1      A 2022-01-01 2022-01-07    NA
2      A 2022-01-02 2022-01-07     1
3      A 2022-01-04 2022-01-07     1
4      A 2022-01-10 2022-01-07    NA
5      B 2022-01-01 2022-01-06     3
6      B 2022-01-03 2022-01-06     3
7      B 2022-01-04 2022-01-06     3
8      B 2022-01-06 2022-01-06     3
9      C 2022-01-01 2022-01-09    NA
10     C 2022-01-03 2022-01-09    NA
11     C 2022-01-04 2022-01-09     2
12     C 2022-01-11 2022-01-09    NA

As you can see in group A only the next row is filled with 1 because the last date1 is after date2. So I was wondering if anyone knows how to fill the NA based on a certain date per group?


dput df:

df<-structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C"), date1 = c("2022-01-01", "2022-01-02", "2022-01-04", 
"2022-01-10", "2022-01-01", "2022-01-03", "2022-01-04", "2022-01-06", 
"2022-01-01", "2022-01-03", "2022-01-04", "2022-01-11"), date2 = c("2022-01-07", 
"2022-01-07", "2022-01-07", "2022-01-07", "2022-01-06", "2022-01-06", 
"2022-01-06", "2022-01-06", "2022-01-09", "2022-01-09", "2022-01-09", 
"2022-01-09"), value = c(NA, 1, NA, NA, 3, NA, NA, NA, NA, NA, 
2, NA)), class = "data.frame", row.names = c(NA, -12L))

CodePudding user response:

Create an additional group and use fill

library(dplyr)
library(tidyr)
df %>% 
  group_by(group, after = date1 > date2) %>% 
  fill(value) %>% 
  ungroup() %>% 
  select(-after)

# A tibble: 12 × 4
   group date1      date2      value
   <chr> <chr>      <chr>      <dbl>
 1 A     2022-01-01 2022-01-07    NA
 2 A     2022-01-02 2022-01-07     1
 3 A     2022-01-04 2022-01-07     1
 4 A     2022-01-10 2022-01-07    NA
 5 B     2022-01-01 2022-01-06     3
 6 B     2022-01-03 2022-01-06     3
 7 B     2022-01-04 2022-01-06     3
 8 B     2022-01-06 2022-01-06     3
 9 C     2022-01-01 2022-01-09    NA
10 C     2022-01-03 2022-01-09    NA
11 C     2022-01-04 2022-01-09     2
12 C     2022-01-11 2022-01-09    NA

CodePudding user response:

using data.table

library(data.table)                                                                                                                                                    

setDT(df)[as.Date(date1) <= as.Date(date2), value := nafill(value, type = "locf"), group]

results

    group      date1      date2 value
 1:     A 2022-01-01 2022-01-07    NA
 2:     A 2022-01-02 2022-01-07     1
 3:     A 2022-01-04 2022-01-07     1
 4:     A 2022-01-10 2022-01-07    NA
 5:     B 2022-01-01 2022-01-06     3
 6:     B 2022-01-03 2022-01-06     3
 7:     B 2022-01-04 2022-01-06     3
 8:     B 2022-01-06 2022-01-06     3
 9:     C 2022-01-01 2022-01-09    NA
10:     C 2022-01-03 2022-01-09    NA
11:     C 2022-01-04 2022-01-09     2
12:     C 2022-01-11 2022-01-09    NA
  • Related