How to group per time interval in R-CodePudding

I have a data set below

Date	Status	Value
05/12/2021 23:59	Failed	500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Failed	1500
05/12/2021 23:59	Failed	1500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Failed	1500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Failed	500
05/12/2021 23:59	Failed	1500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Failed	1500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Failed	1500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Failed	500
05/12/2021 23:59	Failed	500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	500
05/12/2021 23:59	Successful	1500
05/12/2021 23:59	Successful	500

I want to be able to split the Date time column to have my time

then group the time per an hour interval

Then Summarise to get the below column

I want to get how many count of transactions processed within an hour

then the value within an hour

then have a column that state how many were successful, then another column that state how many failed within an hour

see the output of the desired summarised table below

Interval	Value	Count	Successful	Failed
00:00 am - 00:59 am	32,000	54	40	15
00:59 am - 01:00 am	42,000	55	41	14
01:00 am - 02:59 am	21,400	56	42	14
03:00 am - 03:59 am	4,00	57	43	14
04:00 am - 04:59 am	543,000	58	2	56
05:00 am - 05:59 am	411,000	59	6	53

CodePudding user response：

How about this:

library(tidyverse)
library(lubridate)
library(glue)

df <- tribble(~Date,    ~Status,    ~Value,
              "05/12/2021 23:59",   "Failed",   500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Successful",   500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Successful",   1500,
              "05/12/2021 23:59",   "Failed",   1500)

df2 <- df %>% 
    mutate(Datetime = dmy_hms(Date), # convert to datetime format
           Date = as.Date(Datetime), # extract date, if you need it later
           Hour = hour(Datetime))    # extract hour

hourly_value <- df2 %>% 
    group_by(Hour) %>% 
    summarize(Value = sum(Value),
              .groups = "drop")

hourly_count <- df2 %>% 
    count(Hour, Status) %>% 
    pivot_wider(names_from = "Status", values_from = "n")

interval_helper <- tibble(Hour = 0:23,
                          display_hour = str_pad(Hour %% 12, 2, pad = '0'),
                          ampm = if_else(Hour < 12, "am", "pm"),
                          Interval = glue("{display_hour}:00 {ampm} - {display_hour}:59 {ampm}"))

full_join(hourly_value, hourly_count, by = "Hour") %>% 
    replace_na(list(Successful = 0L, Failed = 0L, Value = 0)) %>% 
    left_join(interval_helper, by = "Hour") %>% 
    mutate(Count = Successful   Failed) %>% 
    select(Interval, Value, Count, Successful, Failed)

I wasn't exactly sure how your original Date column is formatted. Here I assume it's a string. Because the exact format of the Interval column is important to you, it seemed easier to make and join a separate tibble with the strings you want displayed.

It's important to replace the missing NA values with zeros, otherwise the Count = Successful Failed will secretly fail when only one of them is present.