I have the following code snippet:
ls3['REP'] = pd.to_datetime(ls3['REP']).dt.to_period('M')
ls3['month'] = pd.to_datetime(ls3['month']).dt.to_period('M')
ls3['MonthsBetween'] = ls3['REP']-ls3['month']
So rep is a column of values like 2022-05 and month is also in the format YYYY-MM. I want a column that gives me the difference as an integer number of months.
So if my rep column is 2022-05 and my month column is 2022-01, then the difference should be 4. The method above gives me an output <month-end>
in every row.
I have also tried
ls3["MonthsBetween"] = (ls3['REP']-ls3["month"])/np.timedelta64(1, "M")
So instead of turning the full dates 2022-05-01 into 2022-05, I subtract say 2022-05-01 - 2022-01-20 and convert this to a month then round it. The rounding causes issues as I don't care how many days are between the two, I really only want to subtract the two month values.
CodePudding user response:
IIUC, you don't have to overwrite your existing columns, perform the computation and assign directly:
ls3['MonthsBetween'] = (pd.to_datetime(ls3['REP']).dt.to_period('M')
-pd.to_datetime(ls3['month']).dt.to_period('M')
)
If you want integers:
ls3['MonthsBetween'] = (pd.to_datetime(ls3['REP']).dt.to_period('M').astype(int)
-pd.to_datetime(ls3['month']).dt.to_period('M').astype(int)
)
Output:
REP month MonthsBetween
0 2022-05-01 2022-01 4
Used input:
ls3 = pd.DataFrame({'REP': ['2022-05-01'], 'month': ['2022-01']})