Home > other >  Pandas DataFrame - Access Values That are created on the fly
Pandas DataFrame - Access Values That are created on the fly

Time:11-02

I am trying figure out something which I can easily preform on excel but I am having a hard time to understand how to do it on a Pandas Data Frame without using loops.

Suppose that I have a data frame as follows:

 ------------ ------- ------- ----- ------ 
|    Date    | Price | Proxy | Div | Days |
 ------------ ------- ------- ----- ------ 
| 13/01/2021 | 10    |    20 | 0.5 | NaN  |
| 08/01/2021 | NaN   |    30 | 0.6 | 5    |
| 04/01/2021 | NaN   |    40 | 0.7 | 4    |
| 03/01/2021 | NaN   |    50 | 0.8 | 1    |
| 01/01/2021 | NaN   |    60 | 0.9 | 2    |
 ------------ ------- ------- ----- ------ 

The task is to fill all the Price where price is null. In excel I would suppose that Date is column A and first row of Date id row 2 then to fill NaN in row 2 of Price I would use the formula =(B2)/(((C3/C2)*D3)*E3)=2.22.

Now I want to use the value 2.22 on the fly to fill NaN in row 3 of Price reason being to fill nan of row 3 I need to make use of filled row 2 value. Hence the formula in excel would to fill row 3 price would be =(B3)/(((C4/C3)*D4)*E4).

1 way would be to loop over all the rows of Data Frame that I don't want to do. What would be the vectorised approach to solve this problem?

Expected Output

 ------------ ------- ------- ----- ------ 
|    Date    | Price | Proxy | Div | Days |
 ------------ ------- ------- ----- ------ 
| 13/01/2021 |    10 |    20 | 0.5 | NA   |
| 08/01/2021 |  2.22 |    30 | 0.6 | 5    |
| 04/01/2021 |  0.60 |    40 | 0.7 | 4    |
| 03/01/2021 |  0.60 |    50 | 0.8 | 1    |
| 01/01/2021 |  0.28 |    60 | 0.9 | 2    |
 ------------ ------- ------- ----- ------ 

Current_Price = Prev Price (non-nan) / (((Current_Proxy/Prev_Proxy) * Div) * Days)

Edit

Create initial data frame using code below

data = {'Date': ['2021-01-13', '2021-01-08', '2021-01-04', '2021-01-03', '2021-01-01'],
        'Price':[10, np.nan, np.nan, np.nan,np.nan],
        'Proxy':[20, 30, 40, 50, 60],
        'Div':[0.5, 0.6, 0.7, 0.8, 0.9],
        'Days':[np.nan, 5, 4, 1, 2]}

df = pd.DataFrame(data)

CodePudding user response:

What you want to achieve is actually a cumulated product:

df['Price'] = (df['Price'].combine_first(df['Proxy'].shift()/df.eval('Proxy*Div*Days'))
               .cumprod().round(2))

Output:

         Date  Price  Proxy  Div  Days
0  2021-01-13  10.00     20  0.5   NaN
1  2021-01-08   2.22     30  0.6   5.0
2  2021-01-04   0.60     40  0.7   4.0
3  2021-01-03   0.60     50  0.8   1.0
4  2021-01-01   0.28     60  0.9   2.0
  • Related