Constructing Pandas DataFrame from Nested Dictionaries-CodePudding

I have a nested dictionary with three layers, the bottom layer being a mixture of dictionaries and values and want to convert it into a dataframe with keys from the last layer as column names and keys from the first layer as ids.

dict = {"id1": {"att_1": 1,
                "att_2": {"att2_1": "value1",
                          "att2_2": "value2"}},
        "id2": {"att_1": 2,
                "att_2": {"att2_1": "value3",
                "att2_2": "value4"}}
}

I tried around a little bit with the 'pandas.DataFrame.from_dict()' function:

pd.DataFrame.from_dict({(i): x_dict[i][j] for i in x_dict.keys() for j in x_dict[i].keys()}, orient='index')

However, the output I am getting lost all the values from the second layer(att1):

     att2_1  att2_2
id1  value1  value2
id2  value3  value4

Is there a better way to approach this or how could I fix my current attempt?

CodePudding user response：

you can define a function to traverse and find all attributes by dfs, as folow:

from collections import defaultdict

def convert(node):
    t = defaultdict(list)
    def dfs(node):
            for k, v in node.items():
                if isinstance(v, dict):
                    dfs(v)
                else:
                    t[k].append(v)
    dfs(node)
    return t

now you can simply create your dataframe (di as your main dictionary):

df = pd.DataFrame(convert(di))

CodePudding user response：

from: here

i think you need multiIndex columns and get lists instead inner dicts

new_dict = {(outerKey, innerKey): values for outerKey, innerDict in dict_.items() for innerKey, values in innerDict.items()}
pd.DataFrame(new_dict)

screenshot