Home > OS >  Format python dictionary inside dictionary to export to excel
Format python dictionary inside dictionary to export to excel

Time:06-04

I want to convert several dictionaries into rows for an excel file. I'm super close but I can't figure out how to display it well. Currently, I have this:

import pandas as pd
lst_of_dict = []
test_dict = {'Name':'Tom', 'Age':20, 'favorite_movie_2020':{'Harry Potter':20, 'Fake Movie':20, 'Cinderella':10, 'Hollywood':20}, 'favorite_movie_2021':{'Harry Potter': 21, 'Fake Movie': 20, 'Cinderella': 10, 'Holywood': 20}}
test_dict2 = {'Name':'Oliver', 'Age':40, 'favorite_movie_2020':{'Harry Potter': 30, 'Fake Movie': 50, 'Cinderella': 90, 'Hollywood': 20}}
lst_of_dict.append(test_dict)
lst_of_dict.append(test_dict2)
print(lst_of_dict)

pd.DataFrame(lst_of_dict).to_excel('output2.xlsx')

excel file

You can see how for favorite_movie I have a dictionary and it's not the easiest to read especially since I will have entries with significantly more years pairs for favorite movies. I was thinking of doing what I have for the second table in that excel sheet where I would pivot the right side only of that table but I have not been able to do that with the dictionary at all. I would appreciate any help! (If you think I should display that data some other way I'm all ears!

CodePudding user response:

You can define a helper function to flatten the dictionaries:

def flatten(user):
    favorite_movies = []
    years = []
    for key, value in user.items():
        if isinstance(value, dict):
            favorite_movies.append(pd.Series(value))
            years.append(key[-4:])
    return pd.DataFrame(pd.concat(favorite_movies, axis=1).T).assign(
        Name=user["Name"], Age=user["Age"], Movie_year=years
    )

And then concatenate the flattened dictionaries in a new dataframe:

dfs = [flatten(user) for user in [test_dict, test_dict2]]

df = (
    pd.concat(dfs, axis=0)
    .reindex(
        columns=["Name", "Age", "Movie_year"]
          [col for col in dfs[0].columns if col not in ["Name", "Age", "Movie_year"]]
    )
    .reset_index(drop=True)
)

df.to_excel("output2.xlsx")
print(df)
# Output
     Name  Age Movie_year  Harry Potter  Fake Movie  Cinderella  Hollywood
0     Tom   20       2020            20          20          10         20
1     Tom   20       2021            21          20          10         20
2  Oliver   40       2020            30          50          90         20

CodePudding user response:

Your problem is a nested dictionary and I think all you need is another process to convert this nested dictionary to a non-nested dictionary. With respect to this link, any new dictionary can be made from an old one.

  • Related