I have the following dataframe df (dput
below):
group date1 date2 value
1 A 2022-01-01 2022-01-07 NA
2 A 2022-01-02 2022-01-07 1
3 A 2022-01-04 2022-01-07 NA
4 A 2022-01-10 2022-01-07 NA
5 B 2022-01-01 2022-01-06 3
6 B 2022-01-03 2022-01-06 NA
7 B 2022-01-04 2022-01-06 NA
8 B 2022-01-06 2022-01-06 NA
9 C 2022-01-01 2022-01-09 NA
10 C 2022-01-03 2022-01-09 NA
11 C 2022-01-04 2022-01-09 2
12 C 2022-01-11 2022-01-09 NA
I would like to fill the NA values per group until date2. So all NA values after the first row with a value (group A with value 1) should be filled until date2 if date1 is less. This is the desired output:
group date1 date2 value
1 A 2022-01-01 2022-01-07 NA
2 A 2022-01-02 2022-01-07 1
3 A 2022-01-04 2022-01-07 1
4 A 2022-01-10 2022-01-07 NA
5 B 2022-01-01 2022-01-06 3
6 B 2022-01-03 2022-01-06 3
7 B 2022-01-04 2022-01-06 3
8 B 2022-01-06 2022-01-06 3
9 C 2022-01-01 2022-01-09 NA
10 C 2022-01-03 2022-01-09 NA
11 C 2022-01-04 2022-01-09 2
12 C 2022-01-11 2022-01-09 NA
As you can see in group A only the next row is filled with 1 because the last date1 is after date2. So I was wondering if anyone knows how to fill the NA based on a certain date per group?
dput
df:
df<-structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "C"), date1 = c("2022-01-01", "2022-01-02", "2022-01-04",
"2022-01-10", "2022-01-01", "2022-01-03", "2022-01-04", "2022-01-06",
"2022-01-01", "2022-01-03", "2022-01-04", "2022-01-11"), date2 = c("2022-01-07",
"2022-01-07", "2022-01-07", "2022-01-07", "2022-01-06", "2022-01-06",
"2022-01-06", "2022-01-06", "2022-01-09", "2022-01-09", "2022-01-09",
"2022-01-09"), value = c(NA, 1, NA, NA, 3, NA, NA, NA, NA, NA,
2, NA)), class = "data.frame", row.names = c(NA, -12L))
CodePudding user response:
Create an additional group and use fill
library(dplyr)
library(tidyr)
df %>%
group_by(group, after = date1 > date2) %>%
fill(value) %>%
ungroup() %>%
select(-after)
# A tibble: 12 × 4
group date1 date2 value
<chr> <chr> <chr> <dbl>
1 A 2022-01-01 2022-01-07 NA
2 A 2022-01-02 2022-01-07 1
3 A 2022-01-04 2022-01-07 1
4 A 2022-01-10 2022-01-07 NA
5 B 2022-01-01 2022-01-06 3
6 B 2022-01-03 2022-01-06 3
7 B 2022-01-04 2022-01-06 3
8 B 2022-01-06 2022-01-06 3
9 C 2022-01-01 2022-01-09 NA
10 C 2022-01-03 2022-01-09 NA
11 C 2022-01-04 2022-01-09 2
12 C 2022-01-11 2022-01-09 NA
CodePudding user response:
using data.table
library(data.table)
setDT(df)[as.Date(date1) <= as.Date(date2), value := nafill(value, type = "locf"), group]
results
group date1 date2 value
1: A 2022-01-01 2022-01-07 NA
2: A 2022-01-02 2022-01-07 1
3: A 2022-01-04 2022-01-07 1
4: A 2022-01-10 2022-01-07 NA
5: B 2022-01-01 2022-01-06 3
6: B 2022-01-03 2022-01-06 3
7: B 2022-01-04 2022-01-06 3
8: B 2022-01-06 2022-01-06 3
9: C 2022-01-01 2022-01-09 NA
10: C 2022-01-03 2022-01-09 NA
11: C 2022-01-04 2022-01-09 2
12: C 2022-01-11 2022-01-09 NA