I am trying to calculate the time difference(in days) between the customer's previous visit out time and the customer's latest visit in time.
time difference = latest in time - previous out time
Here is a sample of input data
sample output table
The approach I have tried so far groupby based on customer ID and rank
temp['RANK'] = temp.groupby('customer ID')['in time'].rank(ascending=True)
but I am unsure that how to calculate the difference.
CodePudding user response:
You can use GroupBy.shift()
to get the previous out time
within the group. Substracted by current in time
. Then, use dt.days
to get the number of days of the timedelta between in time
and out time
within the group, as follows:
# convert date strings to datetime format
df['out time'] = pd.to_datetime(df['out time'], dayfirst=True)
df['in time'] = pd.to_datetime(df['in time'], dayfirst=True)
df['Visit diff (in days)'] = (df['in time'] - df['out time'].groupby(df['customer ID']).shift()).dt.days
Data Input:
print(df)
customer ID out time in time
0 1 05-12-1999 15:20:07 05-12-1999 14:23:31
1 1 21-12-1999 09:59:34 21-12-1999 09:41:09
2 2 05-12-1999 11:53:34 05-12-1999 11:05:37
3 2 08-12-1999 19:55:00 08-12-1999 19:40:10
4 3 01-12-1999 15:15:26 01-12-1999 13:08:11
5 3 16-12-1999 17:10:09 16-12-1999 16:34:10
Result:
print(df)
customer ID out time in time Visit diff (in days)
0 1 1999-12-05 15:20:07 1999-12-05 14:23:31 NaN
1 1 1999-12-21 09:59:34 1999-12-21 09:41:09 15.0
2 2 1999-12-05 11:53:34 1999-12-05 11:05:37 NaN
3 2 1999-12-08 19:55:00 1999-12-08 19:40:10 3.0
4 3 1999-12-01 15:15:26 1999-12-01 13:08:11 NaN
5 3 1999-12-16 17:10:09 1999-12-16 16:34:10 15.0
CodePudding user response:
You may try the following:
temp.groupby('customer ID').apply(lambda x: (x['in time'].max() - x['out time'].min()).days )