Home > database >  Calculating the days difference between previous visit time to current visit time for same customer
Calculating the days difference between previous visit time to current visit time for same customer

Time:10-04

I am trying to calculate the time difference(in days) between the customer's previous visit out time and the customer's latest visit in time.

time difference = latest in time - previous out time

Here is a sample of input data

enter image description here

sample output table

enter image description here

The approach I have tried so far groupby based on customer ID and rank

temp['RANK'] = temp.groupby('customer ID')['in time'].rank(ascending=True)

but I am unsure that how to calculate the difference.

CodePudding user response:

You can use GroupBy.shift() to get the previous out time within the group. Substracted by current in time. Then, use dt.days to get the number of days of the timedelta between in time and out time within the group, as follows:

# convert date strings to datetime format
df['out time'] = pd.to_datetime(df['out time'], dayfirst=True)
df['in time'] = pd.to_datetime(df['in time'], dayfirst=True)

df['Visit diff (in days)'] = (df['in time'] - df['out time'].groupby(df['customer ID']).shift()).dt.days

Data Input:

print(df)

   customer ID             out time              in time
0            1  05-12-1999 15:20:07  05-12-1999 14:23:31
1            1  21-12-1999 09:59:34  21-12-1999 09:41:09
2            2  05-12-1999 11:53:34  05-12-1999 11:05:37
3            2  08-12-1999 19:55:00  08-12-1999 19:40:10
4            3  01-12-1999 15:15:26  01-12-1999 13:08:11
5            3  16-12-1999 17:10:09  16-12-1999 16:34:10

Result:

print(df)

   customer ID            out time             in time  Visit diff (in days)
0            1 1999-12-05 15:20:07 1999-12-05 14:23:31                   NaN
1            1 1999-12-21 09:59:34 1999-12-21 09:41:09                  15.0
2            2 1999-12-05 11:53:34 1999-12-05 11:05:37                   NaN
3            2 1999-12-08 19:55:00 1999-12-08 19:40:10                   3.0
4            3 1999-12-01 15:15:26 1999-12-01 13:08:11                   NaN
5            3 1999-12-16 17:10:09 1999-12-16 16:34:10                  15.0

CodePudding user response:

You may try the following:

temp.groupby('customer ID').apply(lambda x: (x['in time'].max() - x['out time'].min()).days )
  • Related