Home > Software engineering >  Dictionary to two columns DataFrame
Dictionary to two columns DataFrame

Time:06-02

I have a large dictionary with the following simplistic structure:

dict_1 = {'A': {'x1': 0, 'x2': 1}, 'B': {'x1': 0, 'x2': 1, 'x3': 0}, 'C': {'x1': 1, 'x3': 1}}

I would like to transform it to a pandas dataframe with the following structure

        "Var_1"  "Var_2"
  A       x_1      0  
  A       x_2      1
  B       x_1      0
  B       x_2      1
  B       x_3      0
  C       x_1      1
  C       x_3      1

My first attempt was to do something like

dict_1 = pd.DataFrame(dict_1)
dict_1 = dict_1.unstack().reset_index(level=1).set_axis(["Var_1","Var_2"], axis=1)
dict_1.dropna(inplace=True) 

However, I realized that during the whole process, many np.NaN are involved in all transformations over-saturating the memory and therefore, making the whole transformation a lot of time- and resource-consuming.

Is there a simpler and faster way to achieve this?

CodePudding user response:

You could stack and rename:

df = pd.DataFrame(dict_1)
output = df.stack().reset_index(level=0).rename(columns={"level_0":"Var_1",0:"Var_2"})

>>> output
  Var_1  Var_2
A    x1    0.0
B    x1    0.0
C    x1    1.0
A    x2    1.0
B    x2    1.0
B    x3    0.0
C    x3    1.0

Alternatively, with melt:

output = df.rename_axis("Var_1").reset_index().melt("Var_1",["A","B","C"],"index","Var_2").set_index("index")

CodePudding user response:

I would suggest using list comprehension to flatten the nested dictionaries:

pd.DataFrame((k1, k2, v2) for k1, v1 in dict_1.items() for k2, v2 in v1.items())

   0   1  2
0  A  x1  0
1  A  x2  1
2  B  x1  0
3  B  x2  1
4  B  x3  0
5  C  x1  1
6  C  x3  1
  • Related