How to convert dataframe with datetimes to daily time series in mean aggregation R-CodePudding

I have a dataframe that looks like this:

Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00", "2009-01-01 10:25:00",
                     "2009-01-02 07:45:00", "2009-01-02 15:32:00", "2009-01-02 11:15:00",
                     "2009-01-02 12:35:00")
Cust_ID = c("1214", "2643", "31231", "41244", "1214", "15317", "51591")
Wait_Time_Mins = c("54","43","88","94","12","130", "170") 
df_have = data.frame(Arrival_DateTime, Cust_ID, Wait_Time_Mins)

and want to convert it so I get the number of customer visits per day, and also their average wait time per day, so it looks something like this:

dates = c("2009-01-01", "2009-01-02")
num_visits = c("3", "4")
avg_wait_time = c("61.7","101.5")
df_want = data.frame(dates, num_visits, avg_wait_time)

How would I go about doing this?

Similarly, is there a way to do monthly aggregations as well?

CodePudding user response：

You can use -

library(dplyr)

df_have %>%
  mutate(Arrival_DateTime = lubridate::ymd_hms(Arrival_DateTime), 
         Date = as.Date(Arrival_DateTime), 
         #For monthly aggregation -
         #Date = format(Arrival_DateTime, '%Y-%m'), 
         Wait_Time_Mins = as.numeric(Wait_Time_Mins)) %>%
  group_by(Date) %>%
  summarise(num_visits = n_distinct(Cust_ID), 
            avg_wait_time = mean(Wait_Time_Mins))

#        Date num_visits avg_wait_time
#1 2009-01-01          3      61.66667
#2 2009-01-02          4     101.50000

CodePudding user response：

Using aggregate().

aggregate(as.double(Wait_Time_Mins) ~ as.Date(Arrival_DateTime), df_have, 
          \(x) c(length(x), mean(x))) |>
  do.call(what=data.frame) |>
  setNames(c('date', 'num_visits', 'avg_wait_time'))
#         date num_visits avg_wait_time
# 1 2009-01-01          3      61.66667
# 2 2009-01-02          4     101.50000

Note: R >= 4.1 used.

Data:

df_have <- structure(list(Arrival_DateTime = c("2009-01-01 08:35:00", "2009-01-01 10:00:00", 
"2009-01-01 10:25:00", "2009-01-02 07:45:00", "2009-01-02 15:32:00", 
"2009-01-02 11:15:00", "2009-01-02 12:35:00"), Cust_ID = c("1214", 
"2643", "31231", "41244", "1214", "15317", "51591"), Wait_Time_Mins = c("54", 
"43", "88", "94", "12", "130", "170")), class = "data.frame", row.names = c(NA, 
-7L))