Home > Software design >  Convert nested dictionary to pandas dataframe
Convert nested dictionary to pandas dataframe

Time:10-06

I have a nested dictionary as below:

stud_data_dict = { 's1' : {'Course 1' : {'Course Name':'Maths', 
                                         'Marks':95,
                                         'Grade': A },
                           'Course 2' : {'Course Name':'Science', 
                                         'Marks': 75,
                                         'Grade': B-}},
                   's2' : {'Course 1' : {'Course Name':'English', 
                                         'Marks': 82,
                                         'Grade': B},
                           'Course 2' : {'Course Name':'Maths', 
                                         'Marks': 90,
                                         'Grade': A}}}

I need to convert it into a dataframe like below

  Student   Course 1                   Course 2
            Course Name  Marks  Grade  Course Name  Marks  Grade
    s1      Maths        95     A      Science      75     B-
    s2      English      82     B      Maths        90     A

I have tired the following code from this answer

stud_df = pandas.DataFrame.from_dict(stud_data_dict, orient="index").stack().to_frame()
final_df = pandas.DataFrame(stud_df[0].values.tolist(), index=stud_df.index)

I am getting the dataframe like below

              Course Name  Marks  Grade             
s1  Course 1  Maths        95     A 
    Course 2  Science      75     B-
s2  Course 1  English      82     B
    Course 2  Maths        90     A

This is the closest I got to the desired output. What changes do I need to make to the code to get the desired dataframe?

CodePudding user response:

Change dictionary first and then pass to Series with reshape by Series.unstack:

#reformat nested dict
#https://stackoverflow.com/a/39807565/2901002
d  = {(level1_key, level2_key, level3_key): values
       for level1_key, level2_dict in stud_data_dict.items()
       for level2_key, level3_dict in level2_dict.items()
       for level3_key, values      in level3_dict.items()}

stud_df = pd.Series(d).unstack([1,2])
print (stud_df)
    Course 1                Course 2            
   Course Name Marks Grade Course Name Marks Grade
s1       Maths    95    A      Science    75    B-
s2     English    82     B       Maths    90     A

Another idea is created dictionary of tuples in keys with defaultdict:

from collections import defaultdict

d = defaultdict(dict)

for k, v in stud_data_dict.items():
    for k1, v1 in v.items():
        for k2, v2 in v1.items():
            d[(k1, k2)].update({k: v2})

df = pd.DataFrame(d)
print(df)
      Course 1                Course 2            
   Course Name Marks Grade Course Name Marks Grade
s1       Maths    95    A      Science    75    B-
s2     English    82     B       Maths    90     A

CodePudding user response:

One option is to create data frames from the inner dictionaries, concatenate into a single frame, reshape and cleanup:

out = {key: pd.DataFrame.from_dict(value, orient='index') 
       for key, value in stud_data_dict.items()}

(pd
.concat(out)
.unstack()
.swaplevel(axis = 1)
.sort_index(axis = 1)
.rename_axis('Student')
.reset_index()
)
  Student    Course 1                Course 2
          Course Name Grade Marks Course Name Grade Marks
0      s1       Maths    A     95     Science    B-    75
1      s2     English     B    82       Maths     A    90
  • Related