Home > OS >  Merging by group between dates with conditional repetition
Merging by group between dates with conditional repetition

Time:10-19

I am asking for assistance on a particular merging problem using data.table.

Here is the example data:

library(data.table)


# Create example dataset
DT_A = data.table(
  Store = "A",
  Date = as.Date(sprintf("10-d-d", c(22:25, 26:28), rep(1:2, 4:3)),
    '%m-%d-%y')
)

DT_B = data.table(
  Store = "B",
  Date = as.Date(sprintf("10-d-d", c(22:25, 26:28), rep(1:2, 4:3)),
                 '%m-%d-%y')
)

DT <- rbindlist(list(DT_A, DT_B))


DT
    Store       Date
 1:     A 2001-10-22
 2:     A 2001-10-23
 3:     A 2001-10-24
 4:     A 2001-10-25
 5:     A 2002-10-26
 6:     A 2002-10-27
 7:     A 2002-10-28
 8:     B 2001-10-22
 9:     B 2001-10-23
10:     B 2001-10-24
11:     B 2001-10-25
12:     B 2002-10-26
13:     B 2002-10-27
14:     B 2002-10-28

So, DT, has observations of store A and B across multiple dates.

I have another dataset, manager_DT say, that has the start and end dates of managers:


manager_DT <- data.table(Manager = c("John", "David", "Steve"),
                         Store = c("A", "A","B"),
                         min_date = c(as.Date("2001-10-22"),
                                      as.Date("2001-10-26"),
                                      as.Date("2001-10-22")),
                         max_date = c(as.Date("2001-10-27"),
                                      as.Date("2001-10-28"),
                                      as.Date("2002-10-28")))

 manager_DT

   Manager Store   min_date   max_date
1:    John     A 2001-10-22 2001-10-27
2:   David     A 2001-10-26 2001-10-28
3:   Steve     B 2001-10-22 2002-10-28

It's possible that there is more than one manager at store at a given time. Here, John and David have overlapping tenure at Store A (specifically on 2001-10-26 and 2001-10-27), but Steve is the only manager at store B.

Using data.table methods, I would like to merge manager_DT onto DT so that the desired output is:

DT
Store       Date       Manager
 1:     A 2001-10-22    John
 2:     A 2001-10-23    John
 3:     A 2001-10-24    John
 4:     A 2001-10-25    John
 5:     A 2002-10-26    John 
 6:     A 2002-10-26    David
 7:     A 2002-10-27    John
 8:     A 2002-10-27    David
 9:     A 2002-10-28    David
 10:     B 2001-10-22    Steve  
 11:     B 2001-10-23    Steve
12:     B 2001-10-24    Steve
13:     B 2001-10-25    Steve
14:     B 2002-10-26    Steve
15:     B 2002-10-27    Steve
16:     B 2002-10-28    Steve

Note here that there is only one manager column and whenever there are overlapping dates, the row is repeated (Here, two dates are repeated: 2001-10-26 and 2001-10-27 where BOTH John and David are managers at store A).

The idea here is that I want unique observations at the Date x Store x Manager level.

Thank you!

CodePudding user response:

If you are interested in creating the merged dataframe, a solution can be:

library(tidyverse)
library(lubridate)
library(data.table)

manager_DT <- structure(list(Manager = c("John", "David", "Steve"), Store = c("A", 
"A", "B"), min_date = c("2001-10-22", "2001-10-26", "2001-10-22"
), max_date = c("2001-10-27", "2001-10-28", "2002-10-28")), row.names = c(NA, 
-3L), class = "data.frame")

n <- nrow(manager_DT)

map_dfr(1:n, ~ data.table(
  Store=manager_DT$Store[.x],
  Date=seq(ymd(manager_DT$min_date[.x]),ymd(manager_DT$max_date[.x]),by="days"),
  Manager = manager_DT$Manager[.x]
  ))

CodePudding user response:

A possible solution:

library(dplyr)

DT <- structure(list(Store = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B", "B"), Date = c("2001-10-22", 
"2001-10-23", "2001-10-24", "2001-10-25", "2002-10-26", "2002-10-26", 
"2002-10-27", "2002-10-27", "2002-10-28", "2001-10-22", "2001-10-23", 
"2001-10-24", "2001-10-25", "2002-10-26", "2002-10-27", "2002-10-28"
), Manager = c("John", "John", "John", "John", "John", "David", 
"John", "David", "David", "Steve", "Steve", "Steve", "Steve", 
"Steve", "Steve", "Steve")), row.names = c(NA, -16L), class = "data.frame")

DT %>%
  group_by(Store,Date) %>% 
  mutate(Manager = paste(Manager, collapse = " and "))
  • Related