I am trying to use R and ggplot to visualize how many users were online at at the same time throughout a zoom meeting, but I don't really know where to start. I have the 'join time', the 'leave time' and the duration (which is of course just the difference between the two) for each user. What I imagine is a histogram-ish plot (or maybe a line chart?) with the time on the x-axis and the number of attendees at the y-axis. Does anyone have an idea of how to achieve this with the type of data I have – if it is even possible?
The entire dataframe is pretty long, but here is a snippet of the first 20 rows:
join.time leave.time duration
3 2021-11-03 08:51:00 2021-11-03 16:56:00 486
4 2021-11-03 08:53:00 2021-11-03 13:02:00 249
5 2021-11-03 09:00:00 2021-11-03 13:03:00 244
6 2021-11-03 09:00:00 2021-11-03 09:10:00 10
7 2021-11-03 09:00:00 2021-11-03 10:12:00 72
8 2021-11-03 09:01:00 2021-11-03 12:14:00 193
9 2021-11-03 09:02:00 2021-11-03 13:02:00 240
11 2021-11-03 09:14:00 2021-11-03 09:27:00 13
12 2021-11-03 09:14:00 2021-11-03 15:39:00 385
13 2021-11-03 09:21:00 2021-11-03 11:53:00 152
14 2021-11-03 09:25:00 2021-11-03 10:31:00 67
15 2021-11-03 09:26:00 2021-11-03 14:48:00 323
16 2021-11-03 09:27:00 2021-11-03 11:39:00 133
17 2021-11-03 09:27:00 2021-11-03 10:31:00 65
18 2021-11-03 09:27:00 2021-11-03 11:27:00 120
19 2021-11-03 09:28:00 2021-11-03 13:02:00 214
20 2021-11-03 09:29:00 2021-11-03 17:00:00 452
21 2021-11-03 09:29:00 2021-11-03 13:02:00 213
22 2021-11-03 09:29:00 2021-11-03 16:49:00 441
23 2021-11-03 09:30:00 2021-11-03 09:54:00 25
This is the output I got from dput(). I hope it makes it reproducible:
DF <- structure(list(join.time = structure(c(1635925860, 1635925980,
1635926400, 1635926400, 1635926400, 1635926460, 1635926520, 1635927240,
1635927240, 1635927660, 1635927900, 1635927960, 1635928020, 1635928020,
1635928020, 1635928080, 1635928140, 1635928140, 1635928140, 1635928200
), class = c("POSIXct", "POSIXt"), tzone = ""), leave.time = structure(c(1635954960,
1635940920, 1635940980, 1635927000, 1635930720, 1635938040, 1635940920,
1635928020, 1635950340, 1635936780, 1635931860, 1635947280, 1635935940,
1635931860, 1635935220, 1635940920, 1635955200, 1635940920, 1635954540,
1635929640), class = c("POSIXct", "POSIXt"), tzone = ""), duration = c(486L,
249L, 244L, 10L, 72L, 193L, 240L, 13L, 385L, 152L, 67L, 323L,
133L, 65L, 120L, 214L, 452L, 213L, 441L, 25L)), row.names = c(3L,
4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L), class = "data.frame")
CodePudding user response:
I think something like below could work, maybe not the most elegant but it's how I would do something like this. What I did first was make a set of intervals. I took all of the times, removed duplicates, ordered them, and then took the leading time, so each row would have a start and end time. Then I did a for loop to see how many rows of the original data frame fell within the timeframe of the interval. Then I did a simple line plot of the result. Hopefully this is along the lines of what you are looking for!
DF <- structure(list(join.time = structure(c(1635925860, 1635925980,
1635926400, 1635926400, 1635926400, 1635926460, 1635926520, 1635927240,
1635927240, 1635927660, 1635927900, 1635927960, 1635928020, 1635928020,
1635928020, 1635928080, 1635928140, 1635928140, 1635928140, 1635928200
), class = c("POSIXct", "POSIXt"), tzone = ""), leave.time = structure(c(1635954960,
1635940920, 1635940980, 1635927000, 1635930720, 1635938040, 1635940920,
1635928020, 1635950340, 1635936780, 1635931860, 1635947280, 1635935940,
1635931860, 1635935220, 1635940920, 1635955200, 1635940920, 1635954540,
1635929640), class = c("POSIXct", "POSIXt"), tzone = ""), duration = c(486L,
249L, 244L, 10L, 72L, 193L, 240L, 13L, 385L, 152L, 67L, 323L,
133L, 65L, 120L, 214L, 452L, 213L, 441L, 25L)), row.names = c(3L,
4L, 5L, 6L, 7L, 8L, 9L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L), class = "data.frame")
library(dplyr)
library(tidyr)
library(ggplot2)
DFintervals<-DF%>%
select(-duration)%>% #Not needed for me
tidyr::gather()%>% #Gathered the times together
arrange(value)%>% #Arranged by date
distinct(value, .keep_all = T)%>% #Only keep unique times
mutate(value2 = lead(value, 1))%>% #Mutated the next time, so we have an interval, a start and stop stop time for each grouping
select(-key)%>% #Key is unneeded here
na.omit() #Last one isn't an interval, removed
DF2<-DFintervals%>% #Created a new data frame and made a balnk column for count
mutate(Count = NA)
for(i in 1:nrow(DFintervals)) { #For loop, how many times did they join before at at the start of the interval, and leave at or beyond the end of the interval
DF2$Count[i] <- sum(DF$join.time <= DFintervals$value[i] & DF$leave.time >= DFintervals$value2[i])
}
ggplot(DF2) aes(value, Count) geom_line() #Plotted the data in a line graph
CodePudding user response:
Here is a base R approach. Create a sequence of times (in minutes) from the first join.time
to the last leave.time
. Then, for each of these minutes, count how many rows have that minute within their time ranges.
library(ggplot2)
min_times <- seq(min(DF$join.time), max(DF$leave.time), by = "min")
min_count <- data.frame(time = min_times,
count = sapply(min_times,
function(x) {
sum(x <= DF$leave.time & x >= DF$join.time)
}))
ggplot(min_count, aes(x = time, y = count))
geom_line()
scale_x_datetime(date_breaks = "1 hour", date_labels = "%H:%M")
Output