let's say I have multiple lines within my callcenter. For each line I know the timestamps a call was taken (t_Calltaken) and when it was finished (t_Hangup).
library(tidyverse)
library(lubridate)
Callcenter <- data.frame(line=c("A","B","C","D","A","B","D","A","C","D"),
t_Calltaken=c("2019-01-01 00:10:50", "2019-01-01 00:12:30","2019-01-01 00:17:00","2019-01-01 00:20:50","2019-01-01 00:35:20","2019-01-01 00:42:50","2019-01-01 00:48:50","2019-01-01 01:03:20","2019-01-01 01:10:50","2019-01-01 01:23:50"),
t_Hangup=c("2019-01-01 00:33:10", "2019-01-01 00:35:10","2019-01-01 01:07:33","2019-01-01 00:38:50","2019-01-01 00:49:27","2019-01-01 01:22:40","2019-01-01 01:10:41","2019-01-01 01:26:10","2019-01-01 01:47:44","2019-01-01 01:51:15"))
I now want to analyze the maximum number of occupied lines at the same time for a year. As resolution "minutes" is fine. So I calculated the differences in minutes to the beginning of the year (e.g. 2019-01-01 00:00:00) from t_Calltaken and t_Hangup to get something like a minute-id.
For each call I can get the blocked minute-ids with seq(t_Calltaken,t_Hangup,by=1)
.
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1)))
# A tibble: 10 × 6
# Rowwise:
line t_Calltaken t_Hangup start_minute_id end_minute_id blocked_minutes
<chr> <chr> <chr> <dbl> <dbl> <list>
1 A 2019-01-01 00:10:50 2019-01-01 00:33:10 11 33 <dbl [23]>
2 B 2019-01-01 00:12:30 2019-01-01 00:35:10 12 35 <dbl [24]>
3 C 2019-01-01 00:17:00 2019-01-01 01:07:33 17 68 <dbl [52]>
4 D 2019-01-01 00:20:50 2019-01-01 00:38:50 21 39 <dbl [19]>
5 A 2019-01-01 00:35:20 2019-01-01 00:49:27 35 49 <dbl [15]>
6 B 2019-01-01 00:42:50 2019-01-01 01:22:40 43 83 <dbl [41]>
7 D 2019-01-01 00:48:50 2019-01-01 01:10:41 49 71 <dbl [23]>
8 A 2019-01-01 01:03:20 2019-01-01 01:26:10 63 86 <dbl [24]>
9 C 2019-01-01 01:10:50 2019-01-01 01:47:44 71 108 <dbl [38]>
10 D 2019-01-01 01:23:50 2019-01-01 01:51:15 84 111 <dbl [28]>
I now would like to group by line and join all lists with blocked minute-ids together.
How can I do this?
In the next step I want to analyze the number of occurences for blocked-minute-ids to get the maximum number of lines blocked in parallel. Is there another, more efficient way?
Edit:
I would expect an output e.g. like this:
line
1 A
2 B
3 C
4 D
blocked_minutes
1 c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
2 c(12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
3 c(17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108)
4 c(21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)
CodePudding user response:
Not totally sure if this is what you're going for, but:
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>%
unnest_longer(blocked_minutes) %>%
group_by(line) %>%
nest() %>%
unnest_wider(col = data)
# A tibble: 4 x 6
# Groups: line [4]
line t_Calltaken t_Hangup start_minute_id end_minute_id blocked_minutes
<chr> <list> <list> <list> <list> <list>
1 A <chr [62]> <chr [62]> <dbl [62]> <dbl [62]> <dbl [62]>
2 B <chr [65]> <chr [65]> <dbl [65]> <dbl [65]> <dbl [65]>
3 C <chr [90]> <chr [90]> <dbl [90]> <dbl [90]> <dbl [90]>
4 D <chr [70]> <chr [70]> <dbl [70]> <dbl [70]> <dbl [70]>
CodePudding user response:
Thanks to the answer from @Ben G I was able to get to the solution for my longterm goal to find the maximum number of blocked lines at a time.
With unnest_longer()
the lists within blocked_minutes
can be unnested and then I'm able to get to the result just by reforming the DF.
Callcenter %>%
mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>%
rowwise() %>%
mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>%
unnest_longer(blocked_minutes) %>%
mutate(value=1) %>%
pivot_wider(id_cols=blocked_minutes, names_from="line",values_from="value") %>%
mutate(sum_blocked_lines=rowSums(.[,2:ncol(.)],na.rm=TRUE)) %>%
summarize(max_blocked_lines=max(sum_blocked_lines))
Results in:
# A tibble: 1 × 1
max_blocked_lines
<dbl>
1 4