I am trying to convert columns of data in date format to an integer using a referenced date, 01/10/2021 (0). For example, 01/07/2021 would equal 92 and 10/10/2021 equals -9.
It can be done in excel but pandas generate a TypeError when trying to subtract the dates from the reference date.
TypeError: unsupported operand type(s) for -: 'str' and 'str'
CodePudding user response:
You need to convert date column from str
to datetime
first with pandas.to_datetime.
Here is an example:
import pandas as pd
d = pd.DataFrame({'date': ['01/10/2021', '01/07/2021', '10/10/2021']})
d['date'] = pd.to_datetime(d['date'], dayfirst=True)
d['date'][0] - d['date']
Output:
0 0 days
1 92 days
2 -9 days
CodePudding user response:
IIUC, convert to_datetime
and perform a subtraction with the reference timestamp. Then extract the days with dt.days
:
df = pd.DataFrame({'date': ['01/07/2021', '01/10/2021', '10/10/2021']})
df['delta'] = (pd.to_datetime(df['date'], dayfirst=True)
.rsub(pd.to_datetime('01/10/2021', dayfirst=True))
.dt.days
)
Output:
date delta
0 01/07/2021 92
1 01/10/2021 0
2 10/10/2021 -9