I have a bunch of dates in the following format
LastUpdated
2021-06-19 12:41:38
2021-06-20 15:09:05
2021-06-20 15:09:39
2021-06-28 08:59:29
2021-08-03 14:02:03
Last updated being the name of that column in my dataframe. How can I add an extra column to the dataframe which will be the difference in days between now and the date of each row?
CodePudding user response:
I hope your LastUpdated
column is already of type datetime64
. If not, convert it using pd.to_datetime
. That makes datetime calculation a lot easier. The trick to get the difference in days, hours, minutes, etc. is to divide the Timedelta object with pd.Timedelta(<unit>=1)
:
df["LastUpdated"] = pd.to_datetime(df["LastUpdated"])
df["Diff"] = (pd.Timestamp.now() - df["LastUpdated"]) / pd.Timedelta(days=1)
# If you want to strip out the time
df["Diff"] = (pd.Timestamp.now().normalize() - df["LastUpdated"].dt.normalize()) / pd.Timedelta(days=1)