I have a problem. I have two date fields fromDate
and toDate
. The toDate
also contains a timestamp, e.g. 2021-03-22T18:59:59Z
.
The problem is that I want to calculate the difference between the two values in days. toDate
- fromDate
= difference in days.
When I do this, however, I get the following error [OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'
. I have converted the field toDate
without a timestamp. What would be important to mention is that the two fields can contain empty values.
How can I calculate the difference in days between the two date fields?
Dataframe
id toDate fromDate
0 1 2021-03-22T18:59:59Z 2021-02-22
1 2 None 2021-03-18
2 3 2021-04-22T18:59:59Z 2021-03-22
3 4 2021-02-15T18:59:59Z 2021-02-10
4 5 2021-09-15T18:59:59Z 2021-09-07
5 6 2020-01-12T18:59:59Z None
6 7 2022-02-22T18:59:59Z 2022-01-18
Code
import pandas as pd
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'toDate': ['2021-03-22T18:59:59Z', None, '2021-04-22T18:59:59Z',
'2021-02-15T18:59:59Z', '2021-09-15T18:59:59Z', '2020-01-12T18:59:59Z', '2022-02-22T18:59:59Z'],
'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22',
'2021-02-10', '2021-09-07', None, '2022-01-18']
}
df = pd.DataFrame(data=d)
display(df)
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
display(df)
#df['days'] = df['fromDate'].subtract(df['toDate'])
df['days'] = (df['fromDate'] - df['toDate']).dt.days
[OUT] TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'
What I want
id toDate fromDate days
0 1 2021-03-22 2021-02-22 30
1 2 NaT 2021-03-18 NaT
2 3 2021-04-22 2021-03-22 30
3 4 2021-02-15 2021-02-10 5
4 5 2021-09-15 2021-09-07 8
5 6 2020-01-12 NaT NaT
6 7 2022-02-22 2022-01-18 34
CodePudding user response:
For subtract need datetimes also in toDate
column, so for set times to 00:00:00
use Series.dt.normalize
:
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.normalize()
Or Series.dt.floor
:
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.floor('D')
Another idea is convert both columns to dates, in someoldier pandas versions should failed:
df['toDate'] = pd.to_datetime(df['toDate'], errors='coerce').dt.date
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce').dt.date
df['days'] = (df['toDate'] - df['fromDate']).dt.days
print (df)
id toDate fromDate days
0 1 2021-03-22 2021-02-22 28.0
1 2 NaT 2021-03-18 NaN
2 3 2021-04-22 2021-03-22 31.0
3 4 2021-02-15 2021-02-10 5.0
4 5 2021-09-15 2021-09-07 8.0
5 6 2020-01-12 NaT NaN
6 7 2022-02-22 2022-01-18 35.0