Home > Software engineering >  Count timepoints in R
Count timepoints in R

Time:09-27

I'm trying to plot occurrence of measurement timepoints on a 24h x-axis.
However I struggle to get the correct counts related to time. I want to end up with a table that says how many occurrences of a value x in df1 I have for each value in df2. The results so far are not giving me this. Any input?

This is what I was trying now:

hours <- c("00:05","00:10","00:15","00:20","00:25","00:30","00:35","00:40","00:45","00:50","00:55","01:00","01:05","01:10","01:15","01:20","01:25","01:30","01:35","01:40","01:45","01:50","01:55","02:00","02:05","02:10","02:15","02:20","02:25","02:30","02:35","02:40","02:45","02:50","02:55","03:00","03:05","03:10","03:15","03:20","03:25","03:30","03:35","03:40","03:45","03:50","03:55","04:00","04:05","04:10","04:15","04:20","04:25","04:30","04:35","04:40","04:45","04:50","04:55","05:00","05:05","05:10","05:15","05:20","05:25","05:30","05:35","05:40","05:45","05:50","05:55","06:00","06:05","06:10","06:15","06:20","06:25","06:30","06:35","06:40","06:45","06:50","06:55","07:00","07:05","07:10","07:15","07:20","07:25","07:30","07:35","07:40","07:45","07:50","07:55","08:00","08:05","08:10","08:15","08:20","08:25","08:30","08:35","08:40","08:45","08:50","08:55","09:00","09:05","09:10","09:15","09:20","09:25","09:30","09:35","09:40","09:45","09:50","09:55","10:00","10:05","10:10","10:15","10:20","10:25","10:30","10:35","10:40","10:45","10:50","10:55","11:00","11:05","11:10","11:15","11:20","11:25","11:30","11:35","11:40","11:45","11:50","11:55","12:00","12:05","12:10","12:15","12:20","12:25","12:30","12:35","12:40","12:45","12:50","12:55","13:00","13:05","13:10","13:15","13:20","13:25","13:30","13:35","13:40","13:45","13:50","13:55","14:00","14:05","14:10","14:15","14:20","14:25","14:30","14:35","14:40","14:45","14:50","14:55","15:00","15:05","15:10","15:15","15:20","15:25","15:30","15:35","15:40","15:45","15:50","15:55","16:00","16:05","16:10","16:15","16:20","16:25","16:30","16:35","16:40","16:45","16:50","16:55","17:00","17:05","17:10","17:15","17:20","17:25","17:30","17:35","17:40","17:45","17:50","17:55","18:00","18:05","18:10","18:15","18:20","18:25","18:30","18:35","18:40","18:45","18:50","18:55","19:00","19:05","19:10","19:15","19:20","19:25","19:30","19:35","19:40","19:45","19:50","19:55","20:00","20:05","20:10","20:15","20:20","20:25","20:30","20:35","20:40","20:45","20:50","20:55","21:00","21:05","21:10","21:15","21:20","21:25","21:30","21:35","21:40","21:45","21:50","21:55","22:00","22:05","22:10","22:15","22:20","22:25","22:30","22:35","22:40","22:45","22:50","22:55","23:00","23:05","23:10","23:15","23:20","23:25","23:30","23:35","23:40","23:45","23:50","23:55","00:00")

timepoints <- c("01:00","04:15","07:15","10:30","12:45","16:30","17:15","21:30","22:50","02:20","07:00","10:20","11:50","13:50","15:00","19:00","20:30","22:00","02:50","07:00","10:15","11:30","15:00","18:45","21:30","01:40","05:20","08:30","11:30","12:30","13:50","15:40","17:40","18:40","20:15","01:30","06:00","09:30","11:00","13:00","15:45","18:00","19:00","20:00","21:20","21:50")

df1 <- timepoints
df2 <- hours

colnames(df2) <- "timepoints"
df2$freq <- rbindlist(mget(paste0("df", 1:2)))[, .N, timepoints]

Thanks

Edited for clarification

CodePudding user response:

You could represent these breaks as a lolipop chart. Best to convert timepoints to actual date-time objects in a data frame first:

library(ggplot2)

ggplot(data.frame(time = as.POSIXct(paste("2000-01-01", timepoints))))  
  geom_segment(aes(x = time, xend = time, y = 0, yend = 1), color = "gray50",
               linetype = 2)  
  geom_point(aes(x = time, y = 1), shape = 21, fill = "steelblue", size = 4)  
  geom_hline(yintercept = 0, size = 3, color = "red4")  
  scale_x_datetime(date_labels = "%H:%M", date_breaks = "hour")  
  theme_minimal(base_size = 16)  
  scale_y_continuous(breaks = NULL, name = "", limits = c(0, 1.5))

enter image description here

CodePudding user response:

First of all, you are putting vectors in the variables df1 and df2, not dataframes. This is also what is causing the error in the colnames(df2) <- "timepoints" argument. A vector does not have columns, so it throws an error when you give colnames a vector as input.

I don't know exactly what you mean, but I guess you want something like a histogram. To do this, you could first convert the timepoints to numeric values.

library(tidyverse)

First you load the tidyverse package bundle. It contains a lot of useful packages that makes it easier to manipulate and visualize data. If you haven't downloaded the tidyverse packages yet, just run install.packages("tidyverse").

timepoints <- c("01:00","04:15","07:15","10:30","12:45","16:30","17:15","21:30",
                "22:50","02:20","07:00","10:20","11:50","13:50","15:00","19:00",
                "20:30","22:00","02:50","07:00","10:15","11:30","15:00","18:45",
                "21:30","01:40","05:20","08:30","11:30","12:30","13:50","15:40",
                "17:40","18:40","20:15","01:30","06:00","09:30","11:00","13:00",
                "15:45","18:00","19:00","20:00","21:20","21:50")

Df <- data.frame(timepoints = timepoints)

First i take the timepoints vector and put it in a Dataframe with columnname "timepoints".

Df_new <- Df %>% 
  # We seperate the column into hours, minutes and seconds
  separate(timepoints, into = c("hh", "mm"), ":") %>% 
  # We convert the time into fractions of an hour
  mutate(hh = as.numeric(hh),
         mm = as.numeric(mm) / 60) %>% 
  # lastly we reassemble the times
  mutate(timepoints = hh   mm)

Then I manipulate the timepoint values to numeric values. First I split the hour and minute values using seperate. Then I change the minute values to decimal hours using the mutate function. Lastly, I use the mutate function again to recombine the hour and decimal hour values. This bit of code has mostly been copied from @Seb's comment answer in How to create histogram in R with CSV time data?, so credits to her/him.

hist(x = Df_new$timepoints, breaks = 12)

Next, you can plot the timepoints in a histogram like this, the breaks argument allows you to change to number of bars in the histogram.

ggplot(Df_new, aes(x = timepoints))   geom_histogram(bins = 12)

Alternatively, you can also use the ggplot function from the tidyverse package ggplot. This looks nicer, but might be difficult to understand for people not familiar with ggplot.

If you want to read up on tidyverse commands and packages, you could go to https://r4ds.had.co.nz/index.html, which offers nice and understandable explanations.

Hope this was useful

  •  Tags:  
  • r
  • Related