Home > Enterprise >  Calculate the difference in days between two date fields
Calculate the difference in days between two date fields

Time:05-13

I have a problem. I have two date fields fromDate and toDate. The toDate also contains a timestamp, e.g. 2021-03-22T18:59:59Z. The problem is that I want to calculate the difference between the two values in days. toDate - fromDate = difference in days. When I do this, however, I get the following error [OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'. I have converted the field toDate without a timestamp. What would be important to mention is that the two fields can contain empty values.

How can I calculate the difference in days between the two date fields?

Dataframe

    id  toDate                  fromDate
0   1   2021-03-22T18:59:59Z    2021-02-22
1   2   None                    2021-03-18
2   3   2021-04-22T18:59:59Z    2021-03-22
3   4   2021-02-15T18:59:59Z    2021-02-10
4   5   2021-09-15T18:59:59Z    2021-09-07
5   6   2020-01-12T18:59:59Z    None
6   7   2022-02-22T18:59:59Z    2022-01-18

Code

import pandas as pd
d = {'id': [1, 2, 3, 4, 5, 6, 7],
     'toDate': ['2021-03-22T18:59:59Z', None, '2021-04-22T18:59:59Z', 
'2021-02-15T18:59:59Z', '2021-09-15T18:59:59Z', '2020-01-12T18:59:59Z', '2022-02-22T18:59:59Z'],
     'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22', 
'2021-02-10', '2021-09-07', None, '2022-01-18']
    }
df = pd.DataFrame(data=d)
display(df)
df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate']  = pd.to_datetime(df['fromDate'], errors='coerce')
display(df)

#df['days']  = df['fromDate'].subtract(df['toDate'])
df['days'] = (df['fromDate'] - df['toDate']).dt.days

[OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'

What I want

id  toDate                   fromDate     days
0   1   2021-03-22           2021-02-22   30
1   2   NaT                  2021-03-18   NaT
2   3   2021-04-22           2021-03-22   30
3   4   2021-02-15           2021-02-10    5
4   5   2021-09-15           2021-09-07    8
5   6   2020-01-12           NaT          NaT
6   7   2022-02-22           2022-01-18   34

CodePudding user response:

For subtract need datetimes also in toDate column, so for set times to 00:00:00 use Series.dt.normalize:

df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.normalize()

Or Series.dt.floor:

df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.floor('D')

Another idea is convert both columns to dates, in someoldier pandas versions should failed:

df['toDate']  = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate']  = pd.to_datetime(df['fromDate'], errors='coerce').dt.date

df['days'] = (df['toDate'] - df['fromDate']).dt.days
print (df)
   id      toDate    fromDate  days
0   1  2021-03-22  2021-02-22  28.0
1   2         NaT  2021-03-18   NaN
2   3  2021-04-22  2021-03-22  31.0
3   4  2021-02-15  2021-02-10   5.0
4   5  2021-09-15  2021-09-07   8.0
5   6  2020-01-12         NaT   NaN
6   7  2022-02-22  2022-01-18  35.0
  • Related