For example, let's consider the following dataframe:
Restaurant_ID Floor Cust_Arrival_Datetime
0 100 1 2021-11-17 17:20:00
1 100 1 2021-11-17 17:22:00
2 100 1 2021-11-17 17:25:00
3 100 1 2021-11-17 17:30:00
4 100 1 2021-11-17 17:50:00
5 100 1 2021-11-17 17:51:00
6 100 2 2021-11-17 17:25:00
7 100 2 2021-11-17 18:00:00
8 100 2 2021-11-17 18:50:00
9 100 2 2021-11-17 18:56:00
For the above toy example we can consider that the Cust_Arrival_Datetime
is sorted as well as grouped by store and floor (as seen above). How could we, now, calculate things such as the median time interval that passes for a customer arrival for each unique store and floor group?
The desired output would be:
Restaurant_ID Floor Median Arrival Interval(in minutes)
0 100 1 3
1 100 2 35
The Median Arrival Interval is calculated as follows: for the first floor of the store we can see that by the time the second customer arrives 2 minutes have already passed since the first one arrived. Similarly, 3 minutes have elapsed between the 2nd and the 3rd customer and 5 minutes for the 3rd and 4th customer etc. The median for floor 1 and restaurant 100 would be 3.
I have tried something like this:
df.groupby(['Restaurant_ID', 'Floor'].apply(lambda row: row['Customer_Arrival_Datetime'].shift() - row['Customer_Arrival_Datetime']).apply(np.median)
but this does not work!
Any help is welcome!
CodePudding user response:
IIUC, you can do
(df.groupby(['Restaurant_ID', 'Floor'])['Cust_Arrival_Datetime']
.agg(lambda x: x.diff().dt.total_seconds().median()/60))
and you get
Restaurant_ID Floor
100 1 3.0
2 35.0
Name: Cust_Arrival_Datetime, dtype: float64
you can chain with reset_index
if needed
CodePudding user response:
Consider the following data frame:
df = pd.DataFrame({
'group': [1,1,1,2,2,2],
'time': pd.to_datetime(
['14:14', '14:17', '14:25', '17:29', '17:40','17:43']
)
})
Suppose, you'd like to apply a range of transformations:
def stats(group):
diffs = group.diff().dt.total_seconds()/60
return {
'min': diffs.min(),
'mean': diffs.mean(),
'median': diffs.median(),
'max': diffs.max()
}
Then you simply have to apply these:
>>> df.groupby('group')['time'].agg(stats).apply(pd.Series)
min mean median max
group
1 3.0 5.5 5.5 8.0
2 3.0 7.0 7.0 11.0