Home > Net >  Calculating the difference between dates with certain parameters from data Frame
Calculating the difference between dates with certain parameters from data Frame

Time:12-14

I am interested in calculating the time difference (in days) of time between orders from the same provider

sorce data frame

OrderNumber CatID CustomerID CreateDate
6543 4901 1111 2008-01-01
6543 4901 2222 2008-01-03
1113 4910 3333 2008-01-03
1234 5436 1111 2008-01-05
1245 6434 2222 2008-01-10
1456 2453 1111 2008-01-12
1567 64355 3333 2008-01-13

** the OrderNumber and CatID not interesting **

i want calc the diffrence between date in same costumerID

TARGET DateFrame

OrderNumber CatID CustomerID CreateDate calc days
6543 4901 1111 2008-01-01 4 days
6543 4901 2222 2008-01-03 7 days
1113 4910 3333 2008-01-03 10 days
1234 5436 1111 2008-01-05 7 days
1245 6434 2222 2008-01-10 5 days
1456 2453 1111 2008-01-12 1 days
1567 64355 3333 2008-01-13 -
6436 64355 1111 2008-01-13 -
1253 64355 2222 2008-01-15 -

** the OrderNumber and CatID not interesting **

orginal data

       OrderNumber  CustomerID CreateDate
0            27986   712841200 2008-01-01
1            28917   712841200 2008-04-16
2            28451   712630100 2008-02-27
3            29836   712630100 2008-08-21
4            30158   712630100 2008-10-27
...            ...         ...        ...
21621        44318   712841600 2014-12-03
21622        44322   712841600 2014-12-03
21623        42829   712847200 2014-03-05
21624        43547   712847200 2014-06-26
21625        42830   712847200 2014-03-05

CodePudding user response:

You must first convert your CreateDate column to datetime format:

df['CreateDate'] = pd.to_datetime(df['CreateDate'])

Then, you can find the location of the same CustomerID using the .loc method:

customer_ID = df.loc[df['CustomerID'] == '1111']

Finally, you can compute the difference days between two row:

abs((customer_ID['CreateDate'].loc[0] - customer_ID['CreateDate'][1]).days)

The abs is for ensuring to return the positive value.

CodePudding user response:

As @Mohammadreza Riahi said, it would be best if you convert your CreateDate column to datetime format (but save the old date):

df['CreateDate_dt'] = pd.to_datetime(df['CreateDate'])

(At this point, it's recommended to confirm the dates were parsed correctly).

Then I would order by customer id and date:

df = df.sort_values([CustomerID,CreateDate_dt])

Then add this column:

df['date_diff'] = np.where(df['CustomerID']==df['CustomerID'].shift(),
                          df['CreateDate_dt'] df['CreateDate_dt'].shift(),
                          np.nan)

The cool thing here is that there's NaN next to the first order of the customer and values in days between all the rest.

if you want days in INT:

df['date_diff_days']=df['date_diff'].astype('timedelta64[D]')

Now you can do this for example:

df.groupby('CustomerID').agg(purchases=('OrderNumber', 'nunique'),
                             first_purchase=('CreateDate_dt', 'min'),
                             mean_days_between_purchases=('date_diff_days', 'mean'))

CodePudding user response:

the answer is

data = data.sort_values(by=['CustomerID', 'CreateDate'], ignore_index=True)

data['DaysFromPrevCustOrder'] = np.where(
    data['CustomerID'].shift(1) == data['CustomerID'],
    (data['CreateDate'] - data['CreateDate'].shift(1)).dt.days,np.nan)

** ## output ## **

       CustomerID CreateDate  OrderNumber  DaysFromPrevCustOrder
0       712013005 2008-02-13        28327                    NaN
1       712013005 2008-02-13        28327                    0.0
2       712013005 2008-02-13        28327                    0.0
3       712013005 2009-03-22        31051                  403.0
4       712013005 2009-03-22        31051                    0.0
...           ...        ...          ...                    ...
21621   712970100 2009-06-23        31662                    9.0
21622   712970100 2010-04-08        33420                  289.0
21623   712970100 2011-01-06        35184                  273.0
21624   712970100 2012-05-28        38712                  508.0
21625  7122973100 2008-09-28        30020                    NaN
  • Related