Get the N previous month of a column of months based on another column which contains N (pandas)-CodePudding

I've got a dataframe with two columns MONTH and N. MONTH is in yyyy-mm format and N is an integer that specifies how far back the MONTH needs to go. I'm looking for a computationally efficient way to do this as my dataset is large (over 50M rows)

here's an example:

df = pd.DataFrame({'MONTH':['2020-01', '2020-02', '2020-03', '2020-04'], 
            'N': [10,8,5,2]})

desired_output:

     MONTH   N    N_PREV_MONTH
0  2020-01  10      2019-03
1  2020-02   8      2019-06
2  2020-03   5      2019-10
3  2020-04   2      2020-02

I understand that MONTH can to be changed to yyyy-mm-01 format, before subtracting N months from it. what's the best way to get N_PREV_MONTH in vector format to make it as fast as possible?

CodePudding user response：

If convert values to month periods simply subtract column N:

df['MONTH'] = pd.PeriodIndex(df['MONTH'], freq='m')
df['N_PREV_MONTH'] = df['MONTH'].sub(df['N']).astype(str)
print(df)
     MONTH   N N_PREV_MONTH
0  2020-01  10      2019-03
1  2020-02   8      2019-06
2  2020-03   5      2019-10
3  2020-04   2      2020-02

Slowier alternative:

df['MONTH'] = pd.to_datetime(df['MONTH'])
df['N_PREV_MONTH'] = df.apply(lambda x: x['MONTH'] - pd.offsets.DateOffset(months=x['N']), axis=1).dt.strftime('%Y-%m')