I have two dataframes:
df_1 <- data.frame(DATES_1=c("12-30-2022", "03-02-2023","05-05-2023"),
DV01_1=c(1,2,3))
df_2 <- data.frame(DATES_2=c("01-01-2023", "02-01-2023", "05-01-2023","06-01-2023"),
DV01_2=c(1,2,3,4))
I want to join the DV01_1 values from df_1 into df_2 respecting the range of dates in DATES_2. the final answer should be this dataframe:
df_3 <- data.frame(DATES_2=c("01-01-2023", "02-01-2023", "05-01-2023","06-01-2023"),
DV01_2=c(1,2,3,4), DV01_1=c(0,0,2,3))
So for example, the last element from df_1 should consider that
01/01/23,02/01/23,05/01/23 < DATES_1 < 06/01/23.
This process can be easily replicated using excel formula SUMIFS, for example.
CodePudding user response:
We may use a non-equi join - Convert the dates to date class in each of the datasets, then shift
the DATES_2
to get the previous date as starting date in the range, do a join with the df_1 using a non-equi join to create the DV01_1 column in the df_2
library(data.table)
df_2$DATES_2 <- as.Date(df_2$DATES_2, "%m-%d-%Y")
df_1$DATES_1 <- as.Date(df_1$DATES_1, "%m-%d-%Y")
setDT(df_2)[, st := shift(DATES_2, fill = first(DATES_2))]
df_2[df_1, DV01_1 := DV01_1,
on = .(st <= DATES_1, DATES_2 > DATES_1), mult = "last"]
df_2[, st := NULL][is.na(DV01_1), DV01_1 := 0]
-output
> df_2
DATES_2 DV01_2 DV01_1
1: 2023-01-01 1 0
2: 2023-02-01 2 0
3: 2023-05-01 3 2
4: 2023-06-01 4 3
OP's expected output
> df_3
DATES_2 DV01_2 DV01_1
1 01-01-2023 1 0
2 02-01-2023 2 0
3 05-01-2023 3 2
4 06-01-2023 4 3
Or this can be done with the devel
version of dplyr
library(dplyr)
library(lubridate)
library(tidyr)
df_1 %>%
mutate(DATES_1 = mdy(DATES_1)) %>%
left_join(df_2 %>%
mutate(DATES_2 = mdy(DATES_2),
st = lag(DATES_2, default = first(DATES_2))), .,
by = join_by(closest(st <= DATES_1),
closest(DATES_2 > DATES_1))) %>%
select(names(df_2), DV01_1) %>%
mutate(DV01_1 = replace_na(DV01_1, 0))
-output
DATES_2 DV01_2 DV01_1
1 2023-01-01 1 0
2 2023-02-01 2 0
3 2023-05-01 3 2
4 2023-06-01 4 3