Home > Mobile >  Compute pct_change upto NaN
Compute pct_change upto NaN

Time:01-02

I have a df with index extending past the last data point:

df

2022-01-31  96.210  21649.6
2022-02-28  96.390  21708.4
2022-03-31  97.410  21739.7
2022-04-30  98.630  21644.3
2022-05-31  103.744 21649.2
2022-06-30  102.498 21607.4
2022-07-31  105.138 21636.1
2022-08-31  105.450 21631.8
2022-09-30  109.691 21503.1
2022-10-31  111.745 21414.8
2022-11-30  111.481 21351.6
2022-12-31  104.728 NaN
2023-01-31  103.522 NaN
2023-02-28  NaN
2023-03-31  NaN
2023-04-30  NaN
2023-05-31  NaN
2023-06-30  NaN 
2023-07-31  NaN 
2023-08-31  NaN

and when I compute pct_change, pandas treats NaNs as values and extends the pct_change calculation past the last actual data point. So instead stopping on 2023-01-31 for the first column, pandas continues computing values for 2023-02-28 and so on:

df.pct_change(12)

2022-01-31  0.069713    0.117543
2022-02-28  0.059464    0.106713
2022-03-31  0.069969    0.094989
2022-04-30  0.061336    0.076258
2022-05-31  0.140671    0.060263
2022-06-30  0.141022    0.056075
2022-07-31  0.135400    0.049517
2022-08-31  0.145573    0.038228
2022-09-30  0.186490    0.025632
2022-10-31  0.188271    0.012812
2022-11-30  0.187484    0.000112
2022-12-31  0.090576    -0.006440
2023-01-31  0.076000    -0.013765
2023-02-28  0.073991    -0.016436
2023-03-31  0.062745    -0.017852
2023-04-30  0.049600    -0.013523
2023-05-31  -0.002140   -0.013746
2023-06-30  0.009990    -0.011839
2023-07-31  -0.015370   -0.013149
2023-08-31  -0.018284   -0.012953

How can I tell pandas to compute pct_change only upto NaN values? So the output is:

2022-01-31  0.069713    0.117543
2022-02-28  0.059464    0.106713
2022-03-31  0.069969    0.094989
2022-04-30  0.061336    0.076258
2022-05-31  0.140671    0.060263
2022-06-30  0.141022    0.056075
2022-07-31  0.135400    0.049517
2022-08-31  0.145573    0.038228
2022-09-30  0.186490    0.025632
2022-10-31  0.188271    0.012812
2022-11-30  0.187484    0.000112
2022-12-31  0.090576    NaN
2023-01-31  0.076000    NaN
2023-02-28  NaN         NaN
2023-03-31  NaN         NaN
2023-04-30  NaN         NaN
2023-05-31  NaN         NaN
2023-06-30  NaN         NaN
2023-07-31  NaN         NaN
2023-08-31  NaN         NaN 

The following options won't work for me:

  1. dropna
  2. specifying a range a-la df=df[:'2023-01-31'].pct_change(12)

I need to keep the index past the last data point and I am doing a lot of different pct_changes so specifying a range everytime is too time consuming, I am looking for a nicer solution.

Also, specifying a range won't work because when when the df gets new data points, I will have to manually adjust all of the ranges, which is no bueno.

CodePudding user response:

If you look at the documentation for df.pct_change, you'll find that it has a parameter fill_method that uses pad as the default for handling NaN values before computing the changes. pad (or ffill) means that the function is propagating the last valid observation forward. E.g. in your first column, with periods=12, when you reach the index value 2023-02-28, instead of using the NaN value, the function picks the last valid value (103.522 at 2023-01-31) to do the calculation:

103.522/96.390-1

# 0.0739910779126467 (96.390 being the value at `2022-02-28`, one year earlier)

To avoid this, you want to set fill_method to bfill (cf. the documentation for df.fillna for these methods). Result examplified below with periods=3 (I trust you are only showing the last rows of a longer df, because otherwise all the values before 2023-01-31 should be NaN values):

df.pct_change(periods=3, fill_method='bfill')

                col1      col2
2022-01-31       NaN       NaN
2022-02-28       NaN       NaN
2022-03-31       NaN       NaN
2022-04-30  0.025153 -0.000245
2022-05-31  0.076294 -0.002727
2022-06-30  0.052233 -0.006086
2022-07-31  0.065984 -0.000379
2022-08-31  0.016444 -0.000804
2022-09-30  0.070177 -0.004827
2022-10-31  0.062841 -0.010228
2022-11-30  0.057193 -0.012953
2022-12-31 -0.045245       NaN
2023-01-31 -0.073587       NaN
2023-02-28       NaN       NaN
2023-03-31       NaN       NaN
2023-04-30       NaN       NaN
2023-05-31       NaN       NaN
2023-06-30       NaN       NaN
2023-07-31       NaN       NaN
2023-08-31       NaN       NaN

CodePudding user response:

You can put ptc_change(limit=1), which will stop comparison further after 1 null comparison.

  • Related