Home > OS >  pandas: add new column based on datetime index lookup of same dataframe
pandas: add new column based on datetime index lookup of same dataframe

Time:12-21

I have the following data that I would like to add a new column to that is the current month-over-month percent change. The date is the index in my dataframe

    date    close
1/26/1990   421.2999878
1/29/1990   418.1000061
1/30/1990   410.7000122
1/31/1990   415.7999878
2/23/1990   419.5
2/26/1990   421
2/27/1990   422.6000061
2/28/1990   425.7999878
3/26/1990   438.7999878
3/27/1990   439.5
3/28/1990   436.7000122
3/29/1990   435.3999939
3/30/1990   435.5

The simplest way I could think to do this is to add a column that will contain the previous month end date and also for convenience, the previous month-end 'close' value - and from that I can calculate the current month-over-month change. So in the end, i would have a table that looks like this:

enter image description here

I was able to add the previous month end just fine, but I am having problems now trying to look up the previous month-end close based on the previous month end date. In the the code below, the first line works fine to add the previous month end date as a new column. But the second does not - the idea is to use the prev_month_end date to look up the the month end close value and add that as a column.

df['prev_month_end'] = df.index   pd.offsets.BMonthEnd(-1)
df['prev_month_close'] = df[df.index == df['prev_month_end']]

Any help or suggestions on how to accomplish this would be appreciated.

CodePudding user response:

You can have prev_month_close as follows:

df.reset_index(inplace=True)
df = df[['date', 'close', 'prev_month_end']].merge(df[['date', 'close']].rename(columns={'close': 'prev_month_close',
                                                                                         'date': 'prev_month_end'}),
                                                    how='left', on='prev_month_end')

OUTPUT

             date       close prev_month_end  prev_month_close
    0  1990-01-26  421.299988     1989-12-29               NaN
    1  1990-01-29  418.100006     1989-12-29               NaN
    2  1990-01-30  410.700012     1989-12-29               NaN
    3  1990-01-31  415.799988     1989-12-29               NaN
    4  1990-02-23  419.500000     1990-01-31        415.799988
    5  1990-02-26  421.000000     1990-01-31        415.799988
    6  1990-02-27  422.600006     1990-01-31        415.799988
    7  1990-02-28  425.799988     1990-01-31        415.799988
    8  1990-03-26  438.799988     1990-02-28        425.799988
    9  1990-03-27  439.500000     1990-02-28        425.799988
    10 1990-03-28  436.700012     1990-02-28        425.799988
    11 1990-03-29  435.399994     1990-02-28        425.799988
    12 1990-03-30  435.500000     1990-02-28        425.799988

OR without using reset_index

df = df[['close', 'prev_month_end']].merge(df[['close']].rename(columns={'close': 'prev_month_close'}),
                                                    how='left', left_on='prev_month_end', right_index=True)

OUTPUT

                 close prev_month_end  prev_month_close
date                                                   
1990-01-26  421.299988     1989-12-29               NaN
1990-01-29  418.100006     1989-12-29               NaN
1990-01-30  410.700012     1989-12-29               NaN
1990-01-31  415.799988     1989-12-29               NaN
1990-02-23  419.500000     1990-01-31        415.799988
1990-02-26  421.000000     1990-01-31        415.799988
1990-02-27  422.600006     1990-01-31        415.799988
1990-02-28  425.799988     1990-01-31        415.799988
1990-03-26  438.799988     1990-02-28        425.799988
1990-03-27  439.500000     1990-02-28        425.799988
1990-03-28  436.700012     1990-02-28        425.799988
1990-03-29  435.399994     1990-02-28        425.799988
1990-03-30  435.500000     1990-02-28        425.799988
             

CodePudding user response:

We can convert the index to period index, then group the dataframe by period and aggregate close using last, then shift the period index one month back and map it with the closing values, finally calculate the percent change

i = pd.to_datetime(df.index).to_period('M')
s = i.shift(-1).map(df.groupby(i)['close'].last())
df['mom_pct_change'] = df['close'].sub(s).div(s).mul(100)

                close  mom_pct_change
date                                 
1/26/1990  421.299988             NaN
1/29/1990  418.100006             NaN
1/30/1990  410.700012             NaN
1/31/1990  415.799988             NaN
2/23/1990  419.500000        0.889854
2/26/1990  421.000000        1.250604
2/27/1990  422.600006        1.635406
2/28/1990  425.799988        2.405002
3/26/1990  438.799988        3.053077
3/27/1990  439.500000        3.217476
3/28/1990  436.700012        2.559893
3/29/1990  435.399994        2.254581
3/30/1990  435.500000        2.278068
  • Related