Home > Net >  How to group per time interval in R
How to group per time interval in R

Time:12-14

I have a data set below

Date Status Value
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Failed 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Failed 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500
05/12/2021 23:59 Successful 1500
05/12/2021 23:59 Successful 500

I want to be able to split the Date time column to have my time

then group the time per an hour interval

Then Summarise to get the below column

I want to get how many count of transactions processed within an hour

then the value within an hour

then have a column that state how many were successful, then another column that state how many failed within an hour

see the output of the desired summarised table below

Interval Value Count Successful Failed
00:00 am - 00:59 am 32,000 54 40 15
00:59 am - 01:00 am 42,000 55 41 14
01:00 am - 02:59 am 21,400 56 42 14
03:00 am - 03:59 am 4,00 57 43 14
04:00 am - 04:59 am 543,000 58 2 56
05:00 am - 05:59 am 411,000 59 6 53

CodePudding user response:

How about this:

library(tidyverse)
library(lubridate)
library(glue)

df <- tribble(~Date,    ~Status,    ~Value,
              "05/12/2021 23:59",   "Failed",   500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Successful",   500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Failed",   1500)

df2 <- df %>% 
    mutate(Datetime = dmy_hms(Date), # convert to datetime format
           Date = as.Date(Datetime), # extract date, if you need it later
           Hour = hour(Datetime))    # extract hour

hourly_value <- df2 %>% 
    group_by(Hour) %>% 
    summarize(Value = sum(Value),
              .groups = "drop")

hourly_count <- df2 %>% 
    count(Hour, Status) %>% 
    pivot_wider(names_from = "Status", values_from = "n")

interval_helper <- tibble(Hour = 0:23,
                          display_hour = str_pad(Hour %% 12, 2, pad = '0'),
                          ampm = if_else(Hour < 12, "am", "pm"),
                          Interval = glue("{display_hour}:00 {ampm} - {display_hour}:59 {ampm}"))

full_join(hourly_value, hourly_count, by = "Hour") %>% 
    replace_na(list(Successful = 0L, Failed = 0L, Value = 0)) %>% 
    left_join(interval_helper, by = "Hour") %>% 
    mutate(Count = Successful   Failed) %>% 
    select(Interval, Value, Count, Successful, Failed)

I wasn't exactly sure how your original Date column is formatted. Here I assume it's a string. Because the exact format of the Interval column is important to you, it seemed easier to make and join a separate tibble with the strings you want displayed.

It's important to replace the missing NA values with zeros, otherwise the Count = Successful Failed will secretly fail when only one of them is present.

  • Related