Home > Software engineering >  Fill Dataframe with values from another Dataframe (not the same column names)
Fill Dataframe with values from another Dataframe (not the same column names)

Time:04-18

I'm trying to fill a empty dataframe (OutputData) in Python with values from another dataframe (InputData).

InputData has four columns ("Strike", "DTE", "IV", "Pred_IV") OutputData has as an index all unique Strikes from InputData and as Column names all unique DTE from Input Data.

My goal is to fill the OutputData with the corresponding "Pred_IV" values from InputData. As it needs to match both the index and the column name I'm not getting my head around on how to do it with any known function.

If there is no value in InputData which matches both the index and column name the value can remain NaN

Find below the dataframes I use with the df.to_dict() extract for additional detail.

Many thanks for your help.

Best, Flo

InputData.head()

    Strike  DTE     IV      Pred_IV
8   0.5131  2.784   0.3366  0.733360
9   0.5131  3.781   0.3291  0.735295
20  0.5864  2.784   0.3178  0.733476
21  0.5864  3.781   0.3129  0.735357
22  0.5864  4.778   0.3008  0.736143

InputData.head().to_dict()

{'Strike': {8: 0.5131, 9: 0.5131, 20: 0.5864, 21: 0.5864, 22: 0.5864},
 'DTE': {8: 2.784, 9: 3.781, 20: 2.784, 21: 3.781, 22: 4.778},
 'IV': {8: 0.33659999999999995,
  9: 0.32909999999999995,
  20: 0.3178,
  21: 0.3129,
  22: 0.30079999999999996},
 'Pred_IV': {8: 0.7333602770095773,
  9: 0.7352946387206533,
  20: 0.7334762408944806,
  21: 0.7353567361456718,
  22: 0.7361431377881676}})

OutputData.head()

        0.025   0.101   0.197   0.274   0.523   0.772   1.769   2.267   2.784   3.781   4.778   5.774
0.5131  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
0.5864  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
0.6597  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
0.7330  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
0.7697  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN

OutputData.head(2).to_dict()

{0.025: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 0.101: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 0.197: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 0.274: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 0.523: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 0.772: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 1.769: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 2.267: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 2.784: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 3.781: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 4.778: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan},
 5.774: {0.5131: nan,
  0.5864: nan,
  0.6597: nan,
  0.733: nan,
  0.7696999999999999: nan}}

CodePudding user response:

Here is a way to do what I believe your question is asking:

import pandas as pd
import numpy as np
InputData = pd.DataFrame(
    columns='Strike,DTE,IV,Pred_IV'.split(','),
    index=[8,9,20,21,22],
    data=[[0.5131,  2.784,   0.3366,  0.733360],
    [0.5131,  3.781,   0.3291,  0.735295],
    [0.5864,  2.784,   0.3178,  0.733476],
    [0.5864,  3.781,   0.3129,  0.735357],
    [0.5864,  4.778,   0.3008,  0.736143]])

OutputData = pd.DataFrame(data=np.NaN,
    columns=pd.Index(name='DTE', data=list(set(InputData.DTE.to_list()))), 
    index=pd.Index(name='Strike', data=list(set(InputData.Strike.to_list()))))
def foo(x):
    OutputData.loc[x.Strike, x.DTE] = x.Pred_IV
InputData.apply(foo, axis=1)
print(OutputData)

Output:

DTE        2.784     3.781     4.778
Strike
0.5131  0.733360  0.735295       NaN
0.5864  0.733476  0.735357  0.736143

If you prefer unnamed indexes, you can do this instead:

OutputData = pd.DataFrame(data=np.NaN,
    columns=list(set(InputData.DTE.to_list())),
    index=list(set(InputData.Strike.to_list())))

Output:

           2.784     3.781     4.778
0.5131  0.733360  0.735295       NaN
0.5864  0.733476  0.735357  0.736143
  • Related