Home > Software design >  Pandas - Take value n month before
Pandas - Take value n month before

Time:05-06

I am working with datetime. Is there anyway to get a value of n months before.

For example, the data look like:

dft = pd.DataFrame(
    np.random.randn(100, 1),
    columns=["A"],
    index=pd.date_range("20130101", periods=100, freq="M"),
)

dft

Then:

  • For every Jul of each year, we take value of December in previous year.
  • For other month left (from Aug this year to June next year), we take value of previous month

What I've been trying to do is:

dft['B'] = np.where(dft.index.month == 7, 
                    dft['A'].shift(7, freq='M') ,
                    dft['A'].shift(1, freq='M'))

However, the result is simply a copy of column A. I don't know why. But when I tried for single line of code :

dft['C'] = dft['A'].shift(7, freq='M')

then everything is shifted as expected. I don't know what is the issue here

CodePudding user response:

The issue is index alignment. This shift that you performed acts on the index, but using numpy.where you convert to arrays and lose the index.

Use pandas' where or mask instead, everything will remain as Series and the index will be preserved:

dft['B'] = (dft['A'].shift(1, freq='M')
            .mask(dft.index.month == 7,  dft['A'].shift(7, freq='M'))
            )

output:

                   A         B
2013-01-31 -2.202668       NaN
2013-02-28  0.878792 -2.202668
2013-03-31 -0.982540  0.878792
2013-04-30  0.119029 -0.982540
2013-05-31 -0.119644  0.119029
2013-06-30 -1.038124 -0.119644
2013-07-31  0.177794 -1.038124
2013-08-31  0.206593 -2.202668  <- correct
2013-09-30  0.188426  0.206593
2013-10-31  0.764086  0.188426
...              ...       ...
2020-12-31  1.382249 -1.413214
2021-01-31 -0.303696  1.382249
2021-02-28 -1.622287 -0.303696
2021-03-31 -0.763898 -1.622287
2021-04-30  0.420844 -0.763898

[100 rows x 2 columns]
  • Related