I have a large dictionary with the following simplistic structure:
dict_1 = {'A': {'x1': 0, 'x2': 1}, 'B': {'x1': 0, 'x2': 1, 'x3': 0}, 'C': {'x1': 1, 'x3': 1}}
I would like to transform it to a pandas dataframe with the following structure
"Var_1" "Var_2"
A x_1 0
A x_2 1
B x_1 0
B x_2 1
B x_3 0
C x_1 1
C x_3 1
My first attempt was to do something like
dict_1 = pd.DataFrame(dict_1)
dict_1 = dict_1.unstack().reset_index(level=1).set_axis(["Var_1","Var_2"], axis=1)
dict_1.dropna(inplace=True)
However, I realized that during the whole process, many np.NaN are involved in all transformations over-saturating the memory and therefore, making the whole transformation a lot of time- and resource-consuming.
Is there a simpler and faster way to achieve this?
CodePudding user response:
You could stack
and rename
:
df = pd.DataFrame(dict_1)
output = df.stack().reset_index(level=0).rename(columns={"level_0":"Var_1",0:"Var_2"})
>>> output
Var_1 Var_2
A x1 0.0
B x1 0.0
C x1 1.0
A x2 1.0
B x2 1.0
B x3 0.0
C x3 1.0
Alternatively, with melt
:
output = df.rename_axis("Var_1").reset_index().melt("Var_1",["A","B","C"],"index","Var_2").set_index("index")
CodePudding user response:
I would suggest using list comprehension to flatten the nested dictionaries:
pd.DataFrame((k1, k2, v2) for k1, v1 in dict_1.items() for k2, v2 in v1.items())
0 1 2
0 A x1 0
1 A x2 1
2 B x1 0
3 B x2 1
4 B x3 0
5 C x1 1
6 C x3 1