I have some data that will looks like this:
Dates Delta
0 2022-10-01 10
1 2022-10-01 21
2 2022-10-01 34
I am trying to add a new column, where I can subtract the number in the Delta
column from the date in the Dates
column. Ideally, the output will look like this (i did this by hand so if the dates are wrong, please excuse me).
Dates Delta CalculatedDate
0 2022-10-01 10 2022-09-21
1 2022-10-01 21 2022-09-10
2 2022-10-01 34 2022-08-23
I've tried various versions of this and I'm not having any luck.
# importing libraries to create and manipulate toy data
import pandas as pd
from datetime import datetime, timedelta
# create toy data
df = pd.DataFrame({'Dates': ['2022-10-01', '2022-10-01', '2022-10-01'],
'Delta': [10, 21, 34]})
# cast the `Dates` column as dates
df['Dates'] = pd.to_datetime(df['Dates'])
##### Need help here
# Create a new column, showing the calculated date
df['CalculatedDate'] = df['Dates'] - timedelta(days=df['Delta'])
CodePudding user response:
df['CalculatedDate'] = pd.to_datetime(df['Dates']) - pd.TimedeltaIndex(df['Delta'], unit='D')
df
Dates Delta CalculatedDate
0 2022-10-01 10 2022-09-21
1 2022-10-01 21 2022-09-10
2 2022-10-01 34 2022-08-28
CodePudding user response:
Here is one way to do it
# for each row subtract the delta from the date in the row
# using Day offset
df['calculatedDate']= df.apply(lambda x: x['Dates'] - pd.offsets.Day(x['Delta']), axis=1)
df
Dates Delta calculatedDate
0 2022-10-01 10 2022-09-21
1 2022-10-01 21 2022-09-10
2 2022-10-01 34 2022-08-28
CodePudding user response:
I see Naveed and Panda has a fix that works great, suggesting the one I came up with as well:
for x in range(len(df)):
df.loc[x,'CalculatedDate'] = df.loc[x, 'Dates'] - timedelta(days=int(df.loc[x, 'Delta']))
print(df)
Put it in a for loop so that you can index through each row and do each row individually. Also make df['CalculatedDate'] into df.loc[x,'CalculatedDate']. This way you will do each row individually. Hope this helps