Set a new column based on values from previous rows from different column-CodePudding

I am searching for an efficient way to set a new column based on values from previous rows from different columns. Imagine you have this DataFrame:

pd.DataFrame([[0, 22], [1, 15], [2, 18], [3, 9], [4, 10], [6, 11], [8, 12]],
             columns=['days', 'quantity'])

   days  quantity
0     0        22
1     1        15
2     2        18
3     3         9
4     4        10
5     6        11
6     8        12

Now, I want to have a third column 'quantity_3days_ago', like this:

   days  quantity  quantity_3days_ago
0     0        22                 NaN
1     1        15                 NaN
2     2        18                 NaN
3     3         9                  22
4     4        10                  15
5     6        11                   9
6     8        12                  10

So I need to use the 'days' column to check what the 'quantity' column says for 3 days ago. In case there is no exact value in the 'days' column I want 'quantity_3days_ago' to be the value of the row before. See the last row as an example: 8 - 3 would be 5 in which case I would take the 'quantity' value of the row with days equals 4 for the 'quantity_3days_ago'. I hope this is understandable. I tried using rolling windows and shifting, but I wasn't able to get the desired result. I guess it would probably be possible with a loop over the whole DataFrame. However, this would be rather inefficient. I wonder if this can be done in one line. Thanks for your help!

CodePudding user response：

We can do reindex before shift

rng = range(df.days.iloc[0],df.days.iloc[-1] 1)
df['new'] = df.days.map(df.set_index('days').reindex(rng ,method='ffill')['quantity'].shift(3))
df
Out[125]: 
   days  quantity   new
0     0        22   NaN
1     1        15   NaN
2     2        18   NaN
3     3         9  22.0
4     4        10  15.0
5     6        11   9.0
6     8        12  10.0