I have a dataframe that looks like this:
Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00", "2009-01-01 10:25:00",
"2009-01-02 07:45:00", "2009-01-02 15:32:00", "2009-01-02 11:15:00",
"2009-01-02 12:35:00")
Cust_ID = c("1214", "2643", "31231", "41244", "1214", "15317", "51591")
Wait_Time_Mins = c("54","43","88","94","12","130", "170")
df_have = data.frame(Arrival_DateTime, Cust_ID, Wait_Time_Mins)
and want to convert it so I get the number of customer visits per day, and also their average wait time per day, so it looks something like this:
dates = c("2009-01-01", "2009-01-02")
num_visits = c("3", "4")
avg_wait_time = c("61.7","101.5")
df_want = data.frame(dates, num_visits, avg_wait_time)
How would I go about doing this?
Similarly, is there a way to do monthly aggregations as well?
CodePudding user response:
You can use -
library(dplyr)
df_have %>%
mutate(Arrival_DateTime = lubridate::ymd_hms(Arrival_DateTime),
Date = as.Date(Arrival_DateTime),
#For monthly aggregation -
#Date = format(Arrival_DateTime, '%Y-%m'),
Wait_Time_Mins = as.numeric(Wait_Time_Mins)) %>%
group_by(Date) %>%
summarise(num_visits = n_distinct(Cust_ID),
avg_wait_time = mean(Wait_Time_Mins))
# Date num_visits avg_wait_time
#1 2009-01-01 3 61.66667
#2 2009-01-02 4 101.50000
CodePudding user response:
Using aggregate()
.
aggregate(as.double(Wait_Time_Mins) ~ as.Date(Arrival_DateTime), df_have,
\(x) c(length(x), mean(x))) |>
do.call(what=data.frame) |>
setNames(c('date', 'num_visits', 'avg_wait_time'))
# date num_visits avg_wait_time
# 1 2009-01-01 3 61.66667
# 2 2009-01-02 4 101.50000
Note: R >= 4.1 used.
Data:
df_have <- structure(list(Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00",
"2009-01-01 10:25:00", "2009-01-02 07:45:00", "2009-01-02 15:32:00",
"2009-01-02 11:15:00", "2009-01-02 12:35:00"), Cust_ID = c("1214",
"2643", "31231", "41244", "1214", "15317", "51591"), Wait_Time_Mins = c("54",
"43", "88", "94", "12", "130", "170")), class = "data.frame", row.names = c(NA,
-7L))