Home > OS >  Python Data Frame - How can I evaluate/use a column being created on the fly
Python Data Frame - How can I evaluate/use a column being created on the fly

Time:11-02

Suppose that I have a dataframe as follows:

 --------- ------- ------------ 
| Product | Price | Calculated |
 --------- ------- ------------ 
| A       |    10 | 10         |
| B       |    20 | NaN        |
| C       |    25 | NaN        |
| D       |    30 | NaN        |
 --------- ------- ------------ 

The above can be created using below code:

data = {'Product':['A', 'B', 'C', 'D'],
        'Price':[10, 20, 25, 30],
        'Calculated':[10, np.nan, np.nan, np.nan]}

df = pd.DataFrame(data)

I want to update column calculated on the fly. For 2nd row the calculated = Prv. calculated / Previous Price i.e. calculated at row 2 is 10/10=1

Now that we have value for row 2 calculated row 3 calculated would be 1/20 and so on and so forth.

Expected Output

 --------- ------- ------------ 
| Product | Price | Calculated |
 --------- ------- ------------ 
| A       |    10 |         10 |
| B       |    20 |          1 |
| C       |    25 |       0.05 |
| D       |    30 |      0.002 |
 --------- ------- ------------ 

The above can be achieved using loops but I don't want to use loops instead I need a vectorized approach to update column Calculated. How can I achieve that?

CodePudding user response:

You are looking at cumprod with a shift:

# also `df['Calculated'].iloc[0]` instead of `.ffill()`
df['Calculated'] = df['Calculated'].ffill()/df.Price.cumprod().shift(fill_value=1)

Output:

  Product  Price  Calculated
0       A     10      10.000
1       B     20       1.000
2       C     25       0.050
3       D     30       0.002
  • Related