Home > front end >  Finding difference between dates in two rows where one can be null in Python
Finding difference between dates in two rows where one can be null in Python

Time:05-26

I have a dataset with two datetime columns (let's say Call_Date & Transaction_Date). Call_Date is always populated, as we are only looking for instances where we were able to reach a consumer. Then, if the consumer makes a transaction, we populate Transaction_Date. Therefore, Transaction_date is not always populated.

What I'm trying to get to is when both dates are populated, how many days after the call the consumer made a transaction, without removing the cases where the consumer has not made a transaction.

Any way to do that?

Minimal Example

df = pd.DataFrame({'Customer': ['ABC','XYZ','PQR'],
                   'Call_Date': ['12/8/2021 2:31:07 PM','20/8/2021 5:27:02 AM','5/8/2021 7:29:40 PM'],
                   'Transaction_Date': ['16/8/2021 9:21:58 PM', pd.NaT, pd.NaT]})

CodePudding user response:

https://img.codepudding.com/202205/55a31f11873444c28e49da3bc4eadcdb.jpg

following working for me. please check the attached image to see output

from dateutil.parser import *
import pandas as pd
import numpy as np

df = pd.DataFrame({'Customer': ['ABC','XYZ','PQR'],
                   'Call_Date': ['12/8/2021 2:31:07 PM','20/8/2021 5:27:02 AM','5/8/2021 7:29:40 PM'],
                   'Transaction_Date': ['16/8/2021 9:21:58 PM','','']})

df['DUR'] = df.apply(lambda x : parse(x.Call_Date,fuzzy=True) - parse(x.Transaction_Date,fuzzy=True) if \
                     len(x.Transaction_Date) != 0 else np.nan, axis=1)

Another approach can be used as following, include a function diff() to implement exception handling so now it would not be a problem for any type Null/String/datetime.

please check library docs: https://dateutil.readthedocs.io/en/stable/parser.html

def diff(datetime_old, datetime_new):
    try: return abs(parse(str(datetime_new), fuzzy=True) - parse(str(datetime_old), fuzzy=True))
    except: return np.nan

df['DUR'] = df.apply(lambda x : diff(x.Call_Date, x.Transaction_Date) if pd.notnull(x.Transaction_Date) else np.nan, axis=1)

  • Related