I am not sure if I am even using the term wide/long correctly, but I am trying to reformat shift data in a way that I can see how much labor is being used in hourly increments.
Suppose I have the following dataset:
library(data.table)
sample_DT <- data.table(
store = c("A", "A", "A", "A", "B", "B"),
date = ymd(c("2019-03-24", "2019-03-24", "2019-03-24", "2019-03-24", "2019-03-24", "2019-03-24")),
start_hr = c(23,0,2,7,4,2),
duration_hr = c(8,4,4,12,6,10)
)
which looks like:
store date start_hr duration_hr
<char> <Date> <num> <num>
1: A 2019-03-24 23 8
2: A 2019-03-24 0 4
3: A 2019-03-24 2 4
4: A 2019-03-24 7 12
5: B 2019-03-24 4 6
6: B 2019-03-24 2 10
I am trying to see how much labor stores use during each hourly interval (0-1, 1-2, 2-3, 3-4,...) on each date. So the data should look something like:
store date time_hr usage
A 2019-03-24 0 1
A 2019-03-24 1 1
A 2019-03-24 2 2
A 2019-03-24 3 2
A 2019-03-24 4 1
A 2019-03-24 5 1
B ...
B ...
In the above, time_hr
represents the time interval (e.g., time_hr = 0
means the interval from midnight to 1 AM). Please note that sometimes the shifts can run multiple days (e.g. starts at 23:00 and has a duration of 8 hours).
Thanks!
CodePudding user response:
This can be done a few ways, here's a way using pivoted data so that each shift start is turned into a 1
and each shift end into a -1
.
library(dplyr); library(tidyr); library(lubridate)
sample_df %>%
transmute(store,
start_hr = ymd_h(paste(date, start_hr)),
end_hr = start_hr dhours(duration_hr)) %>%
pivot_longer(-store,
names_to = "change",
values_to = "timestamp") %>%
mutate(change = if_else(change == "start_hr", 1, -1)) %>%
count(store, timestamp, wt = change, name = "change") %>% # total change/hr
arrange(store, timestamp) %>%
complete(store,
timestamp = seq.POSIXt(min(timestamp), max(timestamp),
by = "hour"),
fill = list(change = 0)) %>%
mutate(usage = cumsum(change))
Result
# A tibble: 64 × 4
store timestamp change usage
<chr> <dttm> <dbl> <dbl>
1 A 2019-03-24 00:00:00 1 1
2 A 2019-03-24 01:00:00 0 1
3 A 2019-03-24 02:00:00 1 2
4 A 2019-03-24 03:00:00 0 2
5 A 2019-03-24 04:00:00 -1 1
6 A 2019-03-24 05:00:00 0 1
7 A 2019-03-24 06:00:00 -1 0
8 A 2019-03-24 07:00:00 1 1
9 A 2019-03-24 08:00:00 0 1
10 A 2019-03-24 09:00:00 0 1
# … with 54 more rows