df1: has all dates from 2021-04-20 to 2022-05-15
Date |
---|
2021-04-20 |
2021-04-21 |
2021-04-22 |
2021-04-23 |
2021-04-24 |
2021-04-25 |
2021-04-26 |
... |
2022-04-12 |
2022-04-13 |
2022-04-14 |
2022-05-15 |
df2: Random and incomplete list of dates between 2021-04-20 to 2022-05-15
Date | Group |
---|---|
2021-04-20 | 1001 |
2021-04-21 | 1001 |
2021-04-25 | 1001 |
2021-04-27 | 1001 |
2021-04-30 | 1001 |
2021-05-05 | 1001 |
2021-06-06 | 1001 |
2021-04-20 | 1002 |
2021-04-21 | 1002 |
2021-04-22 | 1002 |
2021-04-23 | 1002 |
2021-05-24 | 1002 |
2021-05-22 | 1002 |
2021-06-23 | 1002 |
2021-06-24 | 1002 |
2021-07-01 | 1002 |
2021-07-12 | 1002 |
2021-08-09 | 1002 |
dfdesired: Want to fill in incomplete dates in df2 with dates in df1. I want to do this by group. This means 1001 should start at 2021-04-20 and end at 2022-05-15. 1002 should also start at 2021-04-20 and end at 2022-05-15.
Date | Group |
---|---|
2021-04-20 | 1001 |
2021-04-21 | 1001 |
2021-04-22 | 1001 |
2021-04-23 | 1001 |
2021-04-24 | 1001 |
2021-04-25 | 1001 |
2021-04-26 | 1001 |
... | ... |
2022-04-12 | 1001 |
2022-04-13 | 1001 |
2022-04-14 | 1001 |
2022-05-15 | 1001 |
2021-04-20 | 1002 |
2021-04-21 | 1002 |
2021-04-22 | 1002 |
2021-04-23 | 1002 |
2021-04-24 | 1002 |
2021-04-25 | 1002 |
2021-04-26 | 1002 |
... | ... |
2022-04-12 | 1002 |
2022-04-13 | 1002 |
2022-04-14 | 1002 |
2022-05-15 | 1002 |
Not sure how to do this. Any help would be appreciated.
CodePudding user response:
You can do a "blind merge" (cartesian expansion) with base::merge(..., by=NULL)
. Here, I'm starting with a frame of just unique Group
values on a new frame of the full date sequence.
dates <- data.frame(Date = seq(as.Date("2021-04-20"), as.Date("2022-05-15"), by="day"))
out <- merge(unique(dat[,c("Group"), drop = FALSE]), dates, by = NULL)
head(out)
# Group Date
# 1 1001 2021-04-20
# 2 1002 2021-04-20
# 3 1001 2021-04-21
# 4 1002 2021-04-21
# 5 1001 2021-04-22
# 6 1002 2021-04-22
tail(out)
# Group Date
# 777 1001 2022-05-13
# 778 1002 2022-05-13
# 779 1001 2022-05-14
# 780 1002 2022-05-14
# 781 1001 2022-05-15
# 782 1002 2022-05-15
Data:
dat <- structure(list(Date = c("2021-04-20", "2021-04-21", "2021-04-25", "2021-04-27", "2021-04-30", "2021-05-05", "2021-06-06", "2021-04-20", "2021-04-21", "2021-04-22", "2021-04-23", "2021-05-24", "2021-05-22", "2021-06-23", "2021-06-24", "2021-07-01", "2021-07-12", "2021-08-09"), Group = c(1001L, 1001L, 1001L, 1001L, 1001L, 1001L, 1001L, 1002L, 1002L, 1002L, 1002L, 1002L, 1002L, 1002L, 1002L, 1002L, 1002L, 1002L)), class = "data.frame", row.names = c(NA, -18L))
CodePudding user response:
Here is a dplyr solution
:
We use
anti_join
to get thedate
that are not indf2
.then we use
bind_rows
to bind these values todf2
after getting date format with
ymd
we arrange foryear
and fillGroup
column.some tweaking.
library(dplyr)
library(lubridate)
anti_join(df1, df2) %>%
bind_rows(df2) %>%
mutate(Date = ymd(Date)) %>%
arrange(year(Date)) %>%
fill(Group, .direction = "updown") %>%
select(1,2) %>%
arrange(Group, Date)
Date Group
1 2021-04-20 1001
2 2021-04-21 1001
3 2021-04-24 1001
4 2021-04-25 1001
5 2021-04-26 1001
6 2021-04-27 1001
7 2021-04-30 1001
8 2021-05-05 1001
9 2021-06-06 1001
10 2021-04-20 1002
11 2021-04-21 1002
12 2021-04-22 1002
13 2021-04-23 1002
14 2021-05-22 1002
15 2021-05-24 1002
16 2021-06-23 1002
17 2021-06-24 1002
18 2021-07-01 1002
19 2021-07-12 1002
20 2021-08-09 1002
21 2022-04-12 1002
22 2022-04-13 1002
23 2022-04-14 1002
24 2022-05-15 1002