I have the following dataframes df and df_dates (dput
below):
> df
group start end
1 A 2022-12-01 2022-12-04
2 A 2022-12-04 2022-12-07
3 A 2022-12-07 2022-12-10
4 A 2022-12-10 2022-12-13
5 A 2022-12-13 2022-12-16
6 A 2022-12-16 2022-12-19
7 B 2022-12-01 2022-12-04
8 B 2022-12-04 2022-12-07
9 B 2022-12-07 2022-12-10
10 B 2022-12-10 2022-12-13
11 B 2022-12-13 2022-12-16
12 B 2022-12-16 2022-12-19
> df_dates
group date value
1 A 2022-12-02 1
2 A 2022-12-14 3
3 B 2022-12-06 2
4 B 2022-12-13 4
I would like to join the rows of df_dates by group where the date column is between the column start and end of df. When I join these two dataframes, the dates of the date column return the same as the dates from the start column of df. Here is the code with output:
df <- data.frame(group = rep(c('A', 'B'), each = 6),
start = c(seq.Date(as.Date('2022-12-01'), as.Date('2022-12-16'), '3 days')),
end = c(seq.Date(as.Date('2022-12-04'), as.Date('2022-12-19'), '3 days')))
df_dates <- data.frame(group = c('A', 'A', 'B', 'B'),
date = as.Date(c('2022-12-02', '2022-12-14', '2022-12-06', '2022-12-13')),
value = c(1,3,2,4))
library(data.table)
setDT(df)
setDT(df_dates)
df_dates[df,
.(group, date, start, end, value),
on = .(group, date >= start, date <= end)]
#> group date start end value
#> 1: A 2022-12-01 2022-12-01 2022-12-04 1
#> 2: A 2022-12-04 2022-12-04 2022-12-07 NA
#> 3: A 2022-12-07 2022-12-07 2022-12-10 NA
#> 4: A 2022-12-10 2022-12-10 2022-12-13 NA
#> 5: A 2022-12-13 2022-12-13 2022-12-16 3
#> 6: A 2022-12-16 2022-12-16 2022-12-19 NA
#> 7: B 2022-12-01 2022-12-01 2022-12-04 NA
#> 8: B 2022-12-04 2022-12-04 2022-12-07 2
#> 9: B 2022-12-07 2022-12-07 2022-12-10 NA
#> 10: B 2022-12-10 2022-12-10 2022-12-13 4
#> 11: B 2022-12-13 2022-12-13 2022-12-16 4
#> 12: B 2022-12-16 2022-12-16 2022-12-19 NA
Created on 2022-12-12 with reprex v2.0.2
As you can see, the dates of the date column are now changed to the dates of the start column, while I want them to be the same as in df_dates dataframe. The desired output should look like this:
#> group date start end value
#> 1: A 2022-12-02 2022-12-01 2022-12-04 1
#> 2: A NA 2022-12-04 2022-12-07 NA
#> 3: A NA 2022-12-07 2022-12-10 NA
#> 4: A NA 2022-12-10 2022-12-13 NA
#> 5: A 2022-12-14 2022-12-13 2022-12-16 3
#> 6: A NA 2022-12-16 2022-12-19 NA
#> 7: B NA 2022-12-01 2022-12-04 NA
#> 8: B 2022-12-06 2022-12-04 2022-12-07 2
#> 9: B NA 2022-12-07 2022-12-10 NA
#> 10: B 2022-12-13 2022-12-10 2022-12-13 4
#> 11: B 2022-12-13 2022-12-13 2022-12-16 4
#> 12: B NA 2022-12-16 2022-12-19 NA
So I was wondering if anyone knows how to join these two dataframes in the right way using data.table
?
dput
of df and df_dates:
df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), start = structure(c(19327, 19330, 19333,
19336, 19339, 19342, 19327, 19330, 19333, 19336, 19339, 19342
), class = "Date"), end = structure(c(19330, 19333, 19336, 19339,
19342, 19345, 19330, 19333, 19336, 19339, 19342, 19345), class = "Date")), class = "data.frame", row.names = c(NA,
-12L))
df_dates <- structure(list(group = c("A", "A", "B", "B"), date = structure(c(19328,
19340, 19332, 19339), class = "Date"), value = c(1, 3, 2, 4)), class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
library(data.table)
setDT(df)
setDT(df_dates)
Just specifying you want to the original date
(you can use prefix x.
or i.
(to reference df
in this case) )
df_dates[df,
.(group, x.date, start, end, value),
on = .(group, date >= start, date <= end)]
Or modifying the original:
df[, c("date", "value") :=
df_dates[.SD, on = .(group, date >= start, date <= end), .(x.date, value)]]
# group start end date value
# <char> <Date> <Date> <Date> <num>
# 1: A 2022-12-01 2022-12-04 2022-12-02 1
# 2: A 2022-12-04 2022-12-07 <NA> NA
# 3: A 2022-12-07 2022-12-10 <NA> NA
# 4: A 2022-12-10 2022-12-13 <NA> NA
# 5: A 2022-12-13 2022-12-16 2022-12-14 3
# 6: A 2022-12-16 2022-12-19 <NA> NA
# 7: B 2022-12-01 2022-12-04 <NA> NA
# 8: B 2022-12-04 2022-12-07 2022-12-06 2
# 9: B 2022-12-07 2022-12-10 <NA> NA
# 10: B 2022-12-10 2022-12-13 2022-12-13 4
# 11: B 2022-12-13 2022-12-16 2022-12-13 4
# 12: B 2022-12-16 2022-12-19 <NA> NA