I am trying figure out something which I can easily preform on excel but I am having a hard time to understand how to do it on a Pandas Data Frame without using loops.
Suppose that I have a data frame as follows:
------------ ------- ------- ----- ------
| Date | Price | Proxy | Div | Days |
------------ ------- ------- ----- ------
| 13/01/2021 | 10 | 20 | 0.5 | NaN |
| 08/01/2021 | NaN | 30 | 0.6 | 5 |
| 04/01/2021 | NaN | 40 | 0.7 | 4 |
| 03/01/2021 | NaN | 50 | 0.8 | 1 |
| 01/01/2021 | NaN | 60 | 0.9 | 2 |
------------ ------- ------- ----- ------
The task is to fill all the Price where price is null. In excel I would suppose that Date
is column A
and first row of Date
id row 2 then to fill NaN in row 2 of Price
I would use the formula =(B2)/(((C3/C2)*D3)*E3)=2.22
.
Now I want to use the value 2.22
on the fly to fill NaN in row 3 of Price
reason being to fill nan of row 3 I need to make use of filled row 2 value. Hence the formula in excel would to fill row 3 price would be =(B3)/(((C4/C3)*D4)*E4)
.
1 way would be to loop over all the rows of Data Frame that I don't want to do. What would be the vectorised approach to solve this problem?
Expected Output
------------ ------- ------- ----- ------
| Date | Price | Proxy | Div | Days |
------------ ------- ------- ----- ------
| 13/01/2021 | 10 | 20 | 0.5 | NA |
| 08/01/2021 | 2.22 | 30 | 0.6 | 5 |
| 04/01/2021 | 0.60 | 40 | 0.7 | 4 |
| 03/01/2021 | 0.60 | 50 | 0.8 | 1 |
| 01/01/2021 | 0.28 | 60 | 0.9 | 2 |
------------ ------- ------- ----- ------
Current_Price = Prev Price (non-nan) / (((Current_Proxy/Prev_Proxy) * Div) * Days)
Edit
Create initial data frame using code below
data = {'Date': ['2021-01-13', '2021-01-08', '2021-01-04', '2021-01-03', '2021-01-01'],
'Price':[10, np.nan, np.nan, np.nan,np.nan],
'Proxy':[20, 30, 40, 50, 60],
'Div':[0.5, 0.6, 0.7, 0.8, 0.9],
'Days':[np.nan, 5, 4, 1, 2]}
df = pd.DataFrame(data)
CodePudding user response:
What you want to achieve is actually a cumulated product:
df['Price'] = (df['Price'].combine_first(df['Proxy'].shift()/df.eval('Proxy*Div*Days'))
.cumprod().round(2))
Output:
Date Price Proxy Div Days
0 2021-01-13 10.00 20 0.5 NaN
1 2021-01-08 2.22 30 0.6 5.0
2 2021-01-04 0.60 40 0.7 4.0
3 2021-01-03 0.60 50 0.8 1.0
4 2021-01-01 0.28 60 0.9 2.0