I have a nested dictionary with three layers, the bottom layer being a mixture of dictionaries and values and want to convert it into a dataframe with keys from the last layer as column names and keys from the first layer as ids.
dict = {"id1": {"att_1": 1,
"att_2": {"att2_1": "value1",
"att2_2": "value2"}},
"id2": {"att_1": 2,
"att_2": {"att2_1": "value3",
"att2_2": "value4"}}
}
I tried around a little bit with the 'pandas.DataFrame.from_dict()' function:
pd.DataFrame.from_dict({(i): x_dict[i][j] for i in x_dict.keys() for j in x_dict[i].keys()}, orient='index')
However, the output I am getting lost all the values from the second layer(att1):
att2_1 att2_2
id1 value1 value2
id2 value3 value4
Is there a better way to approach this or how could I fix my current attempt?
CodePudding user response:
you can define a function to traverse and find all attributes by dfs, as folow:
from collections import defaultdict
def convert(node):
t = defaultdict(list)
def dfs(node):
for k, v in node.items():
if isinstance(v, dict):
dfs(v)
else:
t[k].append(v)
dfs(node)
return t
now you can simply create your dataframe (di as your main dictionary):
df = pd.DataFrame(convert(di))
CodePudding user response:
from: here
i think you need multiIndex columns and get lists instead inner dicts
new_dict = {(outerKey, innerKey): values for outerKey, innerDict in dict_.items() for innerKey, values in innerDict.items()}
pd.DataFrame(new_dict)