Home > Blockchain >  Convert list of dictionaries that has a list of dictionaries within it to a pandas DataFrame
Convert list of dictionaries that has a list of dictionaries within it to a pandas DataFrame

Time:10-08

So I have a list of dictionaries, that itself has lists of dictionaries within it like this:

myDict = [{'Name': 'Jack', 'Children': [{'Child_1': 'Sarah'}, {'Child_2': 'Mary'}], 'Favorite_Colors': [{'Color_1': 'Blue'}, {'Color_2': 'Red'}]}, 
{'Name': 'Jill', 'Children': [{'Child_1': 'Carl'}, {'Child_2': 'Sam'}], 'Favorite_Colors': [{'Color_1': 'Green'}, {'Color_2': 'Yellow'}]}]

What I want to do is convert this to a pandas dataframe in a way that "pulls out" the list of dictionaries within so my final dataframe looks like this:

      Name   Child_1   Child_2   Color_1   Color_2
0     Jack    Sarah     Mary      Blue      Red
1     Jill    Carl      Sam       Green     Yellow

CodePudding user response:

You can use collections.ChainMap.

from collections import ChainMap
import pandas as pd

myDict = [
    {'Name': 'Jack', 'Children': [{'Child_1': 'Sarah'}, {'Child_2': 'Mary'}], 'Favorite_Colors': [{'Color_1': 'Blue'}, {'Color_2': 'Red'}]}, 
    {'Name': 'Jill', 'Children': [{'Child_1': 'Carl'}, {'Child_2': 'Sam'}], 'Favorite_Colors': [{'Color_1': 'Green'}, {'Color_2': 'Yellow'}]}
]

def pre_process(lst):
    res = []
    for dct in lst:
        tmp = {}
        for k1,v1 in dct.items():
            if isinstance(v1, list):
                tmp.update(dict(ChainMap(*v1)))
            else:
                tmp[k1] = v1
        res.append(tmp)
    return res

df = pd.DataFrame(pre_process(myDict))
# ----------------^^^^^^^^^^^^^^^^^^^ -> [{'Name': 'Jack','Child_2': 'Mary','Child_1': 'Sarah','Color_2': 'Red','Color_1': 'Blue'},{'Name': 'Jill','Child_2': 'Sam','Child_1': 'Carl','Color_2': 'Yellow','Color_1': 'Green'}]
print(df)

Output:

   Name Child_2 Child_1 Color_2 Color_1
0  Jack    Mary   Sarah     Red    Blue
1  Jill     Sam    Carl  Yellow   Green

CodePudding user response:

Using a simple loop to reformat the dictionary:

out = []
for d in myDict:
    out.append({})
    for k,v in d.items():
        if isinstance(v, list):
            for d2 in v:
                for k2,v2 in d2.items():
                    out[-1][k2] = v2
        else:
            out[-1][k] = v
    
df = pd.DataFrame(out)

output:

   Name Child_1 Child_2 Color_1 Color_2
0  Jack   Sarah    Mary    Blue     Red
1  Jill    Carl     Sam   Green  Yellow
  • Related