Suppose that I have a dataframe as follows:
--------- ------- ------------
| Product | Price | Calculated |
--------- ------- ------------
| A | 10 | 10 |
| B | 20 | NaN |
| C | 25 | NaN |
| D | 30 | NaN |
--------- ------- ------------
The above can be created using below code:
data = {'Product':['A', 'B', 'C', 'D'],
'Price':[10, 20, 25, 30],
'Calculated':[10, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
I want to update column calculated
on the fly. For 2nd row the calculated = Prv. calculated / Previous Price
i.e. calculated
at row 2 is 10/10=1
Now that we have value for row 2 calculated
row 3 calculated
would be 1/20
and so on and so forth.
Expected Output
--------- ------- ------------
| Product | Price | Calculated |
--------- ------- ------------
| A | 10 | 10 |
| B | 20 | 1 |
| C | 25 | 0.05 |
| D | 30 | 0.002 |
--------- ------- ------------
The above can be achieved using loops but I don't want to use loops instead I need a vectorized approach to update column Calculated
. How can I achieve that?
CodePudding user response:
You are looking at cumprod
with a shift
:
# also `df['Calculated'].iloc[0]` instead of `.ffill()`
df['Calculated'] = df['Calculated'].ffill()/df.Price.cumprod().shift(fill_value=1)
Output:
Product Price Calculated
0 A 10 10.000
1 B 20 1.000
2 C 25 0.050
3 D 30 0.002