R: Join lists with group_by and summarize-CodePudding

let's say I have multiple lines within my callcenter. For each line I know the timestamps a call was taken (t_Calltaken) and when it was finished (t_Hangup).

library(tidyverse)
library(lubridate)

Callcenter <- data.frame(line=c("A","B","C","D","A","B","D","A","C","D"),
           t_Calltaken=c("2019-01-01 00:10:50", "2019-01-01 00:12:30","2019-01-01 00:17:00","2019-01-01 00:20:50","2019-01-01 00:35:20","2019-01-01 00:42:50","2019-01-01 00:48:50","2019-01-01 01:03:20","2019-01-01 01:10:50","2019-01-01 01:23:50"),
           t_Hangup=c("2019-01-01 00:33:10", "2019-01-01 00:35:10","2019-01-01 01:07:33","2019-01-01 00:38:50","2019-01-01 00:49:27","2019-01-01 01:22:40","2019-01-01 01:10:41","2019-01-01 01:26:10","2019-01-01 01:47:44","2019-01-01 01:51:15"))

I now want to analyze the maximum number of occupied lines at the same time for a year. As resolution "minutes" is fine. So I calculated the differences in minutes to the beginning of the year (e.g. 2019-01-01 00:00:00) from t_Calltaken and t_Hangup to get something like a minute-id.

For each call I can get the blocked minute-ids with seq(t_Calltaken,t_Hangup,by=1).

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1)))

# A tibble: 10 × 6
# Rowwise: 
   line  t_Calltaken         t_Hangup            start_minute_id end_minute_id blocked_minutes
   <chr> <chr>               <chr>                         <dbl>         <dbl> <list>         
 1 A     2019-01-01 00:10:50 2019-01-01 00:33:10              11            33 <dbl [23]>     
 2 B     2019-01-01 00:12:30 2019-01-01 00:35:10              12            35 <dbl [24]>     
 3 C     2019-01-01 00:17:00 2019-01-01 01:07:33              17            68 <dbl [52]>     
 4 D     2019-01-01 00:20:50 2019-01-01 00:38:50              21            39 <dbl [19]>     
 5 A     2019-01-01 00:35:20 2019-01-01 00:49:27              35            49 <dbl [15]>     
 6 B     2019-01-01 00:42:50 2019-01-01 01:22:40              43            83 <dbl [41]>     
 7 D     2019-01-01 00:48:50 2019-01-01 01:10:41              49            71 <dbl [23]>     
 8 A     2019-01-01 01:03:20 2019-01-01 01:26:10              63            86 <dbl [24]>     
 9 C     2019-01-01 01:10:50 2019-01-01 01:47:44              71           108 <dbl [38]>     
10 D     2019-01-01 01:23:50 2019-01-01 01:51:15              84           111 <dbl [28]>

I now would like to group by line and join all lists with blocked minute-ids together.

How can I do this?

In the next step I want to analyze the number of occurences for blocked-minute-ids to get the maximum number of lines blocked in parallel. Is there another, more efficient way?

Edit:

I would expect an output e.g. like this:

  line
1    A
2    B
3    C
4    D
                                                                                                                                                                                                                                                                                                                                                                     blocked_minutes
1                                                                                                                          c(11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86)
2                                                                                                              c(12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83)
3 c(17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108)
4      c(21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111)

CodePudding user response：

Not totally sure if this is what you're going for, but:

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>% 

  unnest_longer(blocked_minutes) %>% 
  group_by(line) %>% 
  nest() %>% 
  unnest_wider(col = data)

# A tibble: 4 x 6
# Groups:   line [4]
  line  t_Calltaken t_Hangup   start_minute_id end_minute_id blocked_minutes
  <chr> <list>      <list>     <list>          <list>        <list>         
1 A     <chr [62]>  <chr [62]> <dbl [62]>      <dbl [62]>    <dbl [62]>     
2 B     <chr [65]>  <chr [65]> <dbl [65]>      <dbl [65]>    <dbl [65]>     
3 C     <chr [90]>  <chr [90]> <dbl [90]>      <dbl [90]>    <dbl [90]>     
4 D     <chr [70]>  <chr [70]> <dbl [70]>      <dbl [70]>    <dbl [70]>

CodePudding user response：

Thanks to the answer from @Ben G I was able to get to the solution for my longterm goal to find the maximum number of blocked lines at a time.

With unnest_longer() the lists within blocked_minutes can be unnested and then I'm able to get to the result just by reforming the DF.

Callcenter %>% 
  mutate(start_minute_id=round(as.numeric(difftime(t_Calltaken,"2019-01-01 00:00:00",unit="mins"))),
         end_minute_id=round(as.numeric(difftime(t_Hangup,"2019-01-01 00:00:00",unit="mins")))) %>% 
  rowwise() %>% 
  mutate(blocked_minutes=list(seq(start_minute_id,end_minute_id,by=1))) %>% 
  unnest_longer(blocked_minutes) %>% 
  mutate(value=1) %>% 
  pivot_wider(id_cols=blocked_minutes, names_from="line",values_from="value") %>% 
  mutate(sum_blocked_lines=rowSums(.[,2:ncol(.)],na.rm=TRUE)) %>% 
  summarize(max_blocked_lines=max(sum_blocked_lines))

Results in:

# A tibble: 1 × 1
  max_blocked_lines
              <dbl>
1                 4