I'm trying to fill a empty dataframe (OutputData) in Python with values from another dataframe (InputData).
InputData has four columns ("Strike", "DTE", "IV", "Pred_IV") OutputData has as an index all unique Strikes from InputData and as Column names all unique DTE from Input Data.
My goal is to fill the OutputData with the corresponding "Pred_IV" values from InputData. As it needs to match both the index and the column name I'm not getting my head around on how to do it with any known function.
If there is no value in InputData which matches both the index and column name the value can remain NaN
Find below the dataframes I use with the df.to_dict() extract for additional detail.
Many thanks for your help.
Best, Flo
InputData.head()
Strike DTE IV Pred_IV
8 0.5131 2.784 0.3366 0.733360
9 0.5131 3.781 0.3291 0.735295
20 0.5864 2.784 0.3178 0.733476
21 0.5864 3.781 0.3129 0.735357
22 0.5864 4.778 0.3008 0.736143
InputData.head().to_dict()
{'Strike': {8: 0.5131, 9: 0.5131, 20: 0.5864, 21: 0.5864, 22: 0.5864},
'DTE': {8: 2.784, 9: 3.781, 20: 2.784, 21: 3.781, 22: 4.778},
'IV': {8: 0.33659999999999995,
9: 0.32909999999999995,
20: 0.3178,
21: 0.3129,
22: 0.30079999999999996},
'Pred_IV': {8: 0.7333602770095773,
9: 0.7352946387206533,
20: 0.7334762408944806,
21: 0.7353567361456718,
22: 0.7361431377881676}})
OutputData.head()
0.025 0.101 0.197 0.274 0.523 0.772 1.769 2.267 2.784 3.781 4.778 5.774
0.5131 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
0.5864 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
0.6597 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
0.7330 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
0.7697 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
OutputData.head(2).to_dict()
{0.025: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
0.101: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
0.197: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
0.274: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
0.523: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
0.772: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
1.769: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
2.267: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
2.784: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
3.781: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
4.778: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan},
5.774: {0.5131: nan,
0.5864: nan,
0.6597: nan,
0.733: nan,
0.7696999999999999: nan}}
CodePudding user response:
Here is a way to do what I believe your question is asking:
import pandas as pd
import numpy as np
InputData = pd.DataFrame(
columns='Strike,DTE,IV,Pred_IV'.split(','),
index=[8,9,20,21,22],
data=[[0.5131, 2.784, 0.3366, 0.733360],
[0.5131, 3.781, 0.3291, 0.735295],
[0.5864, 2.784, 0.3178, 0.733476],
[0.5864, 3.781, 0.3129, 0.735357],
[0.5864, 4.778, 0.3008, 0.736143]])
OutputData = pd.DataFrame(data=np.NaN,
columns=pd.Index(name='DTE', data=list(set(InputData.DTE.to_list()))),
index=pd.Index(name='Strike', data=list(set(InputData.Strike.to_list()))))
def foo(x):
OutputData.loc[x.Strike, x.DTE] = x.Pred_IV
InputData.apply(foo, axis=1)
print(OutputData)
Output:
DTE 2.784 3.781 4.778
Strike
0.5131 0.733360 0.735295 NaN
0.5864 0.733476 0.735357 0.736143
If you prefer unnamed indexes, you can do this instead:
OutputData = pd.DataFrame(data=np.NaN,
columns=list(set(InputData.DTE.to_list())),
index=list(set(InputData.Strike.to_list())))
Output:
2.784 3.781 4.778
0.5131 0.733360 0.735295 NaN
0.5864 0.733476 0.735357 0.736143