Can I plot the number of attendees throughout a meeting using R and ggplot?-CodePudding

I am trying to use R and ggplot to visualize how many users were online at at the same time throughout a zoom meeting, but I don't really know where to start. I have the 'join time', the 'leave time' and the duration (which is of course just the difference between the two) for each user. What I imagine is a histogram-ish plot (or maybe a line chart?) with the time on the x-axis and the number of attendees at the y-axis. Does anyone have an idea of how to achieve this with the type of data I have – if it is even possible?

The entire dataframe is pretty long, but here is a snippet of the first 20 rows:

             join.time          leave.time duration
3  2021-11-03 08:51:00 2021-11-03 16:56:00      486
4  2021-11-03 08:53:00 2021-11-03 13:02:00      249
5  2021-11-03 09:00:00 2021-11-03 13:03:00      244
6  2021-11-03 09:00:00 2021-11-03 09:10:00       10
7  2021-11-03 09:00:00 2021-11-03 10:12:00       72
8  2021-11-03 09:01:00 2021-11-03 12:14:00      193
9  2021-11-03 09:02:00 2021-11-03 13:02:00      240
11 2021-11-03 09:14:00 2021-11-03 09:27:00       13
12 2021-11-03 09:14:00 2021-11-03 15:39:00      385
13 2021-11-03 09:21:00 2021-11-03 11:53:00      152
14 2021-11-03 09:25:00 2021-11-03 10:31:00       67
15 2021-11-03 09:26:00 2021-11-03 14:48:00      323
16 2021-11-03 09:27:00 2021-11-03 11:39:00      133
17 2021-11-03 09:27:00 2021-11-03 10:31:00       65
18 2021-11-03 09:27:00 2021-11-03 11:27:00      120
19 2021-11-03 09:28:00 2021-11-03 13:02:00      214
20 2021-11-03 09:29:00 2021-11-03 17:00:00      452
21 2021-11-03 09:29:00 2021-11-03 13:02:00      213
22 2021-11-03 09:29:00 2021-11-03 16:49:00      441
23 2021-11-03 09:30:00 2021-11-03 09:54:00       25

This is the output I got from dput(). I hope it makes it reproducible:

DF <- structure(list(join.time = structure(c(1635925860, 1635925980, 
1635926400, 1635926400, 1635926400, 1635926460, 1635926520, 1635927240, 
1635927240, 1635927660, 1635927900, 1635927960, 1635928020, 1635928020, 
1635928020, 1635928080, 1635928140, 1635928140, 1635928140, 1635928200
), class = c("POSIXct", "POSIXt"), tzone = ""), leave.time = structure(c(1635954960, 
1635940920, 1635940980, 1635927000, 1635930720, 1635938040, 1635940920, 
1635928020, 1635950340, 1635936780, 1635931860, 1635947280, 1635935940, 
1635931860, 1635935220, 1635940920, 1635955200, 1635940920, 1635954540, 
1635929640), class = c("POSIXct", "POSIXt"), tzone = ""), duration = c(486L, 
249L, 244L, 10L, 72L, 193L, 240L, 13L, 385L, 152L, 67L, 323L, 
133L, 65L, 120L, 214L, 452L, 213L, 441L, 25L)), row.names = c(3L, 
4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 20L, 21L, 22L, 23L), class = "data.frame")

CodePudding user response：

I think something like below could work, maybe not the most elegant but it's how I would do something like this. What I did first was make a set of intervals. I took all of the times, removed duplicates, ordered them, and then took the leading time, so each row would have a start and end time. Then I did a for loop to see how many rows of the original data frame fell within the timeframe of the interval. Then I did a simple line plot of the result. Hopefully this is along the lines of what you are looking for!

    DF <- structure(list(join.time = structure(c(1635925860, 1635925980, 
                                                 1635926400, 1635926400, 1635926400, 1635926460, 1635926520, 1635927240, 
                                                 1635927240, 1635927660, 1635927900, 1635927960, 1635928020, 1635928020, 
                                                 1635928020, 1635928080, 1635928140, 1635928140, 1635928140, 1635928200
    ), class = c("POSIXct", "POSIXt"), tzone = ""), leave.time = structure(c(1635954960, 
                                                                             1635940920, 1635940980, 1635927000, 1635930720, 1635938040, 1635940920, 
                                                                             1635928020, 1635950340, 1635936780, 1635931860, 1635947280, 1635935940, 
                                                                             1635931860, 1635935220, 1635940920, 1635955200, 1635940920, 1635954540, 
                                                                             1635929640), class = c("POSIXct", "POSIXt"), tzone = ""), duration = c(486L, 
                                                                                                                                                    249L, 244L, 10L, 72L, 193L, 240L, 13L, 385L, 152L, 67L, 323L, 
                                                                                                                                                    133L, 65L, 120L, 214L, 452L, 213L, 441L, 25L)), row.names = c(3L, 
                                                                                                                                                                                                                  4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 
                                                                                                                                                                                                                  19L, 20L, 21L, 22L, 23L), class = "data.frame")
    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
DFintervals<-DF%>%
  select(-duration)%>% #Not needed for me
  tidyr::gather()%>% #Gathered the times together
  arrange(value)%>% #Arranged by date
  distinct(value, .keep_all = T)%>% #Only keep unique times
  mutate(value2 = lead(value, 1))%>% #Mutated the next time, so we have an interval, a start and stop stop time for each grouping
  select(-key)%>% #Key is unneeded here
  na.omit() #Last one isn't an interval, removed

DF2<-DFintervals%>% #Created a new data frame and made a balnk column for count
  mutate(Count = NA)
for(i in 1:nrow(DFintervals)) { #For loop, how many times did they join before at at the start of the interval, and leave at or beyond the end of the interval
  DF2$Count[i] <- sum(DF$join.time <= DFintervals$value[i] & DF$leave.time >= DFintervals$value2[i])
}

ggplot(DF2) aes(value, Count) geom_line() #Plotted the data in a line graph

CodePudding user response：

Here is a base R approach. Create a sequence of times (in minutes) from the first join.time to the last leave.time. Then, for each of these minutes, count how many rows have that minute within their time ranges.

library(ggplot2)

min_times <- seq(min(DF$join.time), max(DF$leave.time), by = "min")

min_count <- data.frame(time = min_times,
                        count = sapply(min_times, 
                                       function(x) {
                                         sum(x <= DF$leave.time & x >= DF$join.time)
                                       }))

ggplot(min_count, aes(x = time, y = count))  
  geom_line()  
  scale_x_datetime(date_breaks = "1 hour", date_labels = "%H:%M")

Output