I have a dataframe,
df_corp
ID arrival_date leaving_date
1 01/02/20 05/02/20
2 01/03/20 07/03/20
1 12/02/20 20/02/20
1 07/03/20 10/03/20
2 10/03/20 15/03/20
I would like to find the difference between leaving_date
of a row and arrival date
of the next entry with respect to ID. Basically I want to know how long before they book again.
So it'll look something like this.
df_corp
ID arrival_date leaving_date time_between
1 01/02/20 05/02/20 NaN
2 01/03/20 07/03/20 NaN
1 12/02/20 20/02/20 7
1 07/03/20 10/03/20 15
2 10/03/20 15/03/20 3
I've tried grouping by ID to do the sum but I'm seriously lost on how to get the value from the next row and a different column in one.
Thanks for any help.
CodePudding user response:
You need to convert to_datetime
and to perform a GroupBy.shift
to get the previous departure date:
# arrival
a = pd.to_datetime(df_corp['arrival_date'], dayfirst=True)
# previous departure per ID
l = pd.to_datetime(df_corp['leaving_date'], dayfirst=True).groupby(df_corp['ID']).shift()
# difference in days
df_corp['time_between'] = (a-l).dt.days
output:
ID arrival_date leaving_date time_between
0 1 01/02/20 05/02/20 NaN
1 2 01/03/20 07/03/20 NaN
2 1 12/02/20 20/02/20 7.0
3 1 07/03/20 10/03/20 16.0
4 2 10/03/20 15/03/20 3.0