Home > Mobile >  How to groupby and make calculations on consecutive rows of the group?
How to groupby and make calculations on consecutive rows of the group?


For example, let's consider the following dataframe:

   Restaurant_ID Floor Cust_Arrival_Datetime
0       100        1    2021-11-17 17:20:00
1       100        1    2021-11-17 17:22:00
2       100        1    2021-11-17 17:25:00
3       100        1    2021-11-17 17:30:00
4       100        1    2021-11-17 17:50:00
5       100        1    2021-11-17 17:51:00
6       100        2    2021-11-17 17:25:00
7       100        2    2021-11-17 18:00:00
8       100        2    2021-11-17 18:50:00
9       100        2    2021-11-17 18:56:00

For the above toy example we can consider that the Cust_Arrival_Datetime is sorted as well as grouped by store and floor (as seen above). How could we, now, calculate things such as the median time interval that passes for a customer arrival for each unique store and floor group?

The desired output would be:

   Restaurant_ID Floor  Median Arrival Interval(in minutes)
0       100        1    3
1       100        2    35

The Median Arrival Interval is calculated as follows: for the first floor of the store we can see that by the time the second customer arrives 2 minutes have already passed since the first one arrived. Similarly, 3 minutes have elapsed between the 2nd and the 3rd customer and 5 minutes for the 3rd and 4th customer etc. The median for floor 1 and restaurant 100 would be 3.

I have tried something like this:

df.groupby(['Restaurant_ID', 'Floor'].apply(lambda row: row['Customer_Arrival_Datetime'].shift() - row['Customer_Arrival_Datetime']).apply(np.median)

but this does not work!

Any help is welcome!

CodePudding user response:

IIUC, you can do

(df.groupby(['Restaurant_ID', 'Floor'])['Cust_Arrival_Datetime']
   .agg(lambda x: x.diff().dt.total_seconds().median()/60))

and you get

Restaurant_ID  Floor
100            1         3.0
               2        35.0
Name: Cust_Arrival_Datetime, dtype: float64

you can chain with reset_index if needed

CodePudding user response:

Consider the following data frame:

df = pd.DataFrame({
  'group': [1,1,1,2,2,2],
  'time': pd.to_datetime(
    ['14:14', '14:17', '14:25', '17:29', '17:40','17:43']

Suppose, you'd like to apply a range of transformations:

def stats(group):
  diffs = group.diff().dt.total_seconds()/60
  return {
      'min': diffs.min(),
      'mean': diffs.mean(),
      'median': diffs.median(),
      'max': diffs.max()

Then you simply have to apply these:

>>> df.groupby('group')['time'].agg(stats).apply(pd.Series)

       min  mean  median   max
1      3.0   5.5     5.5   8.0
2      3.0   7.0     7.0  11.0
  • Related