I am searching for an efficient way to set a new column based on values from previous rows from different columns. Imagine you have this DataFrame:
pd.DataFrame([[0, 22], [1, 15], [2, 18], [3, 9], [4, 10], [6, 11], [8, 12]],
columns=['days', 'quantity'])
days quantity
0 0 22
1 1 15
2 2 18
3 3 9
4 4 10
5 6 11
6 8 12
Now, I want to have a third column 'quantity_3days_ago', like this:
days quantity quantity_3days_ago
0 0 22 NaN
1 1 15 NaN
2 2 18 NaN
3 3 9 22
4 4 10 15
5 6 11 9
6 8 12 10
So I need to use the 'days' column to check what the 'quantity' column says for 3 days ago. In case there is no exact value in the 'days' column I want 'quantity_3days_ago' to be the value of the row before. See the last row as an example: 8 - 3 would be 5 in which case I would take the 'quantity' value of the row with days equals 4 for the 'quantity_3days_ago'. I hope this is understandable. I tried using rolling windows and shifting, but I wasn't able to get the desired result. I guess it would probably be possible with a loop over the whole DataFrame. However, this would be rather inefficient. I wonder if this can be done in one line. Thanks for your help!
CodePudding user response:
We can do reindex
before shift
rng = range(df.days.iloc[0],df.days.iloc[-1] 1)
df['new'] = df.days.map(df.set_index('days').reindex(rng ,method='ffill')['quantity'].shift(3))
df
Out[125]:
days quantity new
0 0 22 NaN
1 1 15 NaN
2 2 18 NaN
3 3 9 22.0
4 4 10 15.0
5 6 11 9.0
6 8 12 10.0