I am asking for assistance on a particular merging problem using data.table.
Here is the example data:
library(data.table)
# Create example dataset
DT_A = data.table(
Store = "A",
Date = as.Date(sprintf("10-d-d", c(22:25, 26:28), rep(1:2, 4:3)),
'%m-%d-%y')
)
DT_B = data.table(
Store = "B",
Date = as.Date(sprintf("10-d-d", c(22:25, 26:28), rep(1:2, 4:3)),
'%m-%d-%y')
)
DT <- rbindlist(list(DT_A, DT_B))
DT
Store Date
1: A 2001-10-22
2: A 2001-10-23
3: A 2001-10-24
4: A 2001-10-25
5: A 2002-10-26
6: A 2002-10-27
7: A 2002-10-28
8: B 2001-10-22
9: B 2001-10-23
10: B 2001-10-24
11: B 2001-10-25
12: B 2002-10-26
13: B 2002-10-27
14: B 2002-10-28
So, DT
, has observations of store A and B across multiple dates.
I have another dataset, manager_DT
say, that has the start and end dates of managers:
manager_DT <- data.table(Manager = c("John", "David", "Steve"),
Store = c("A", "A","B"),
min_date = c(as.Date("2001-10-22"),
as.Date("2001-10-26"),
as.Date("2001-10-22")),
max_date = c(as.Date("2001-10-27"),
as.Date("2001-10-28"),
as.Date("2002-10-28")))
manager_DT
Manager Store min_date max_date
1: John A 2001-10-22 2001-10-27
2: David A 2001-10-26 2001-10-28
3: Steve B 2001-10-22 2002-10-28
It's possible that there is more than one manager at store at a given time. Here, John and David have overlapping tenure at Store A (specifically on 2001-10-26 and 2001-10-27), but Steve is the only manager at store B.
Using data.table methods, I would like to merge manager_DT
onto DT
so that the desired output is:
DT
Store Date Manager
1: A 2001-10-22 John
2: A 2001-10-23 John
3: A 2001-10-24 John
4: A 2001-10-25 John
5: A 2002-10-26 John
6: A 2002-10-26 David
7: A 2002-10-27 John
8: A 2002-10-27 David
9: A 2002-10-28 David
10: B 2001-10-22 Steve
11: B 2001-10-23 Steve
12: B 2001-10-24 Steve
13: B 2001-10-25 Steve
14: B 2002-10-26 Steve
15: B 2002-10-27 Steve
16: B 2002-10-28 Steve
Note here that there is only one manager column and whenever there are overlapping dates, the row is repeated (Here, two dates are repeated: 2001-10-26 and 2001-10-27 where BOTH John and David are managers at store A).
The idea here is that I want unique observations at the Date x Store x Manager level.
Thank you!
CodePudding user response:
If you are interested in creating the merged dataframe, a solution can be:
library(tidyverse)
library(lubridate)
library(data.table)
manager_DT <- structure(list(Manager = c("John", "David", "Steve"), Store = c("A",
"A", "B"), min_date = c("2001-10-22", "2001-10-26", "2001-10-22"
), max_date = c("2001-10-27", "2001-10-28", "2002-10-28")), row.names = c(NA,
-3L), class = "data.frame")
n <- nrow(manager_DT)
map_dfr(1:n, ~ data.table(
Store=manager_DT$Store[.x],
Date=seq(ymd(manager_DT$min_date[.x]),ymd(manager_DT$max_date[.x]),by="days"),
Manager = manager_DT$Manager[.x]
))
CodePudding user response:
A possible solution:
library(dplyr)
DT <- structure(list(Store = c("A", "A", "A", "A", "A", "A", "A", "A",
"A", "B", "B", "B", "B", "B", "B", "B"), Date = c("2001-10-22",
"2001-10-23", "2001-10-24", "2001-10-25", "2002-10-26", "2002-10-26",
"2002-10-27", "2002-10-27", "2002-10-28", "2001-10-22", "2001-10-23",
"2001-10-24", "2001-10-25", "2002-10-26", "2002-10-27", "2002-10-28"
), Manager = c("John", "John", "John", "John", "John", "David",
"John", "David", "David", "Steve", "Steve", "Steve", "Steve",
"Steve", "Steve", "Steve")), row.names = c(NA, -16L), class = "data.frame")
DT %>%
group_by(Store,Date) %>%
mutate(Manager = paste(Manager, collapse = " and "))