I have a data set below
Date | Status | Value |
---|---|---|
05/12/2021 23:59 | Failed | 500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Failed | 1500 |
05/12/2021 23:59 | Failed | 1500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Failed | 1500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Failed | 500 |
05/12/2021 23:59 | Failed | 1500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Failed | 1500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Failed | 1500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Failed | 500 |
05/12/2021 23:59 | Failed | 500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 500 |
05/12/2021 23:59 | Successful | 1500 |
05/12/2021 23:59 | Successful | 500 |
I want to be able to split the Date time column to have my time
then group the time per an hour interval
Then Summarise to get the below column
I want to get how many count of transactions processed within an hour
then the value within an hour
then have a column that state how many were successful, then another column that state how many failed within an hour
see the output of the desired summarised table below
Interval | Value | Count | Successful | Failed |
---|---|---|---|---|
00:00 am - 00:59 am | 32,000 | 54 | 40 | 15 |
00:59 am - 01:00 am | 42,000 | 55 | 41 | 14 |
01:00 am - 02:59 am | 21,400 | 56 | 42 | 14 |
03:00 am - 03:59 am | 4,00 | 57 | 43 | 14 |
04:00 am - 04:59 am | 543,000 | 58 | 2 | 56 |
05:00 am - 05:59 am | 411,000 | 59 | 6 | 53 |
CodePudding user response:
How about this:
library(tidyverse)
library(lubridate)
library(glue)
df <- tribble(~Date, ~Status, ~Value,
"05/12/2021 23:59", "Failed", 500,
"05/12/2021 23:59", "Successful", 1500,
"05/12/2021 23:59", "Successful", 500,
"05/12/2021 23:59", "Successful", 1500,
"05/12/2021 23:59", "Successful", 1500,
"05/12/2021 23:59", "Failed", 1500)
df2 <- df %>%
mutate(Datetime = dmy_hms(Date), # convert to datetime format
Date = as.Date(Datetime), # extract date, if you need it later
Hour = hour(Datetime)) # extract hour
hourly_value <- df2 %>%
group_by(Hour) %>%
summarize(Value = sum(Value),
.groups = "drop")
hourly_count <- df2 %>%
count(Hour, Status) %>%
pivot_wider(names_from = "Status", values_from = "n")
interval_helper <- tibble(Hour = 0:23,
display_hour = str_pad(Hour %% 12, 2, pad = '0'),
ampm = if_else(Hour < 12, "am", "pm"),
Interval = glue("{display_hour}:00 {ampm} - {display_hour}:59 {ampm}"))
full_join(hourly_value, hourly_count, by = "Hour") %>%
replace_na(list(Successful = 0L, Failed = 0L, Value = 0)) %>%
left_join(interval_helper, by = "Hour") %>%
mutate(Count = Successful Failed) %>%
select(Interval, Value, Count, Successful, Failed)
I wasn't exactly sure how your original Date
column is formatted. Here I assume it's a string. Because the exact format of the Interval
column is important to you, it seemed easier to make and join a separate tibble with the strings you want displayed.
It's important to replace the missing NA
values with zeros, otherwise the Count = Successful Failed
will secretly fail when only one of them is present.