Home > Net >  Calculating time difference in hours between two different rows per gorup with a 'lag' in
Calculating time difference in hours between two different rows per gorup with a 'lag' in

Time:11-19

I have a dataset similar to this.

df = pd.DataFrame({'CODE':["PNR1", "PNR1", "PNR1", "PNR1","PNR2","PNR2","PNR3" ],
        'FROM': ["RDM", "DEN", "FLL", "IAH", "AEP", "COR", "DEL"],
            'TO' : [ "DEN", "FLL", "IAH","SFO", "COR", "AEP", "BOM"],
        'DEPART':["13-07-2021 13:00:00","13-07-2021 17:50:00","15-07-2021 17:15:00","15-07-2021 19:40:00", "22-09-2021 09:05:00", "22-09-2021 17:05:00", "22-09-2021 06:05:00"],
        'ARRIVAL':["13-07-2021 16:26:00","13-07-2021 23:39:00","15-07-2021 18:52:00", "15-07-2021 21:27:00","22-09-2021 10:30:00","22-09-2021 18:20:00","22-09-2021 08:05:00"]})
df['NUMSTOPS'] = df.groupby('CODE')['CODE'].transform('count')
df['DEPART'] = pd.to_datetime(df['DEPART'], format='%d-%m-%Y %H:%M:%S')
df['ARRIVAL'] = pd.to_datetime(df['ARRIVAL'], format='%d-%m-%Y %H:%M:%S')

I want to find the time difference for each group to find out the time difference of 'DEPART' from the previous arrival and show it in hours. I have applied the following:

df['LAYOVER']=df.groupby('CODE').apply(lambda x:x['DEPART']- x['ARRIVAL']).shift(1).fillna('0').reset_index(drop=True)

But that gives me the following output:

    CODE       FROM TO   DEPART           ARRIVAL               NUMSTOPS        LAYOVER
0   PNR1    RDM DEN 2021-07-13 13:00:00 2021-07-13 16:26:00 4   00:00:00
1   PNR1    DEN FLL 2021-07-13 17:50:00 2021-07-13 23:39:00 4   -1 days  20:34:00
2   PNR1    FLL IAH 2021-07-15 17:15:00 2021-07-15 18:52:00 4   -1 days  18:11:00
3   PNR1    IAH SFO 2021-07-15 19:40:00 2021-07-15 21:27:00 4   -1 days  22:23:00
4   PNR2    AEP COR 2021-09-22 09:05:00 2021-09-22 10:30:00 2   -1 days  22:13:00
5   PNR2    COR AEP 2021-09-22 17:05:00 2021-09-22 18:20:00 2   -1 days  22:35:00
6   PNR3    DEL BOM 2021-09-22 06:05:00 2021-09-22 08:05:00 1   -1 days  22:45:00

I don't understand what am I doing wrong and how to proceed with calculating the departure time and the previous arrival for each group.

Expected output:

Expected Output

CodePudding user response:

You have a parentheses out of place. .shift() was outside your parenthesis. fixed here.

df['LAYOVER']=df.groupby('CODE').apply(lambda x:x['DEPART']- x['ARRIVAL'].shift(1)).fillna('0').reset_index(drop=True)
df['LAYOVER'].apply(lambda x: pd.Timedelta(x).seconds)

0        0
1     5040
2    63360
3     2880
4        0
5    23700
6        0
Name: LAYOVER, dtype: int64

this is more concise:

df.groupby('CODE').apply(lambda x:x['DEPART']- x['ARRIVAL'].shift(1)).fillna('0').reset_index(drop=True).dt.seconds
  • Related