I want to create a function that counts the days as an integer between a date and the date shifted back a number of periods (e.g. df['new_col'] = (df['date'].shift(#periods)-df['date']). The date variable is datetime64[D]. As an example: df['report_date'].shift(39) = '2008-09-26' and df['report_date'] = '2008-08-18' and df['delta'] = 39.
import pandas as pd
from datetime import datetime
from datetime import timedelta
import datetime as dt
dates =pd.Series(np.tile(['2012-08-01','2012-08-15','2012-09-01','2012-08-15'],4)).astype('datetime64[D]')
dates2 =pd.Series(np.tile(['2012-08-01','2012-09-01','2012-10-01','2012-11-01'],4)).astype('datetime64[D]')
stocks = ['A','A','A','A','G','G','G','G','B','B','B','B','F','F','F','F']
stocks = pd.Series(stocks)
df = pd.DataFrame(dict(stocks = stocks, dates = dates,report_date = dates2)).reset_index()
df.head()
print('df info:',df.info())
The code below is my latest attempt to create this variable, but the code produces incorrect results.
df['delta'] = df.groupby(['stocks','dates'])['report_date'].transform(lambda x: (x.shift(1).rsub(x).dt.days))
CodePudding user response:
I came up with the solution of using a for loop and zip function, to simply subtract each pair like so...
from datetime import datetime
import pandas as pd
dates = ['2012-08-01', '2012-08-15', '2012-09-01', '2012-08-15']
dates2 = ['2012-08-01', '2012-09-01', '2012-10-01', '2012-11-01']
diff = []
for i, x in zip(dates, dates2):
i = datetime.strptime(i, '%Y-%m-%d')
x = datetime.strptime(x, '%Y-%m-%d')
diff.append(i - x)
df = {'--col1--': dates, '--col2--': dates2, '--difference--': diff}
df = pd.DataFrame(df)
print(df)
Ouput:
--col1-- --col2-- --difference--
0 2012-08-01 2012-08-01 0 days
1 2012-08-15 2012-09-01 -17 days
2 2012-09-01 2012-10-01 -30 days
3 2012-08-15 2012-11-01 -78 days
Process finished with exit code 0
I hope that solves your problem.