Home > Enterprise >  Time data: Wide to Long
Time data: Wide to Long

Time:05-24

I am not sure if I am even using the term wide/long correctly, but I am trying to reformat shift data in a way that I can see how much labor is being used in hourly increments.

Suppose I have the following dataset:

library(data.table)
sample_DT <- data.table(
  store = c("A", "A", "A", "A", "B", "B"),
  date = ymd(c("2019-03-24", "2019-03-24", "2019-03-24", "2019-03-24", "2019-03-24", "2019-03-24")),
  start_hr = c(23,0,2,7,4,2),
  duration_hr = c(8,4,4,12,6,10)
  )

which looks like:

     store     date    start_hr duration_hr
   <char>     <Date>    <num>       <num>
1:      A 2019-03-24       23           8
2:      A 2019-03-24        0           4
3:      A 2019-03-24        2           4
4:      A 2019-03-24        7          12
5:      B 2019-03-24        4           6
6:      B 2019-03-24        2          10

I am trying to see how much labor stores use during each hourly interval (0-1, 1-2, 2-3, 3-4,...) on each date. So the data should look something like:


store    date    time_hr usage
A   2019-03-24    0       1
A   2019-03-24    1       1
A   2019-03-24    2       2
A   2019-03-24    3       2
A   2019-03-24    4       1
A   2019-03-24    5       1 
B   ...
B   ...

In the above, time_hr represents the time interval (e.g., time_hr = 0 means the interval from midnight to 1 AM). Please note that sometimes the shifts can run multiple days (e.g. starts at 23:00 and has a duration of 8 hours).

Thanks!

CodePudding user response:

This can be done a few ways, here's a way using pivoted data so that each shift start is turned into a 1 and each shift end into a -1.

library(dplyr); library(tidyr); library(lubridate)
sample_df %>%
  transmute(store, 
            start_hr = ymd_h(paste(date, start_hr)),
            end_hr = start_hr   dhours(duration_hr)) %>%
  pivot_longer(-store, 
               names_to = "change",
               values_to = "timestamp") %>%
  mutate(change = if_else(change == "start_hr", 1, -1)) %>%
  count(store, timestamp, wt = change, name = "change") %>%  # total change/hr
  arrange(store, timestamp) %>%
  complete(store, 
           timestamp = seq.POSIXt(min(timestamp), max(timestamp), 
                                  by = "hour"), 
           fill = list(change = 0)) %>%
  mutate(usage = cumsum(change))

Result

# A tibble: 64 × 4
   store timestamp           change usage
   <chr> <dttm>               <dbl> <dbl>
 1 A     2019-03-24 00:00:00      1     1
 2 A     2019-03-24 01:00:00      0     1
 3 A     2019-03-24 02:00:00      1     2
 4 A     2019-03-24 03:00:00      0     2
 5 A     2019-03-24 04:00:00     -1     1
 6 A     2019-03-24 05:00:00      0     1
 7 A     2019-03-24 06:00:00     -1     0
 8 A     2019-03-24 07:00:00      1     1
 9 A     2019-03-24 08:00:00      0     1
10 A     2019-03-24 09:00:00      0     1
# … with 54 more rows
  • Related