I have a nested dictionary as below:
stud_data_dict = { 's1' : {'Course 1' : {'Course Name':'Maths',
'Marks':95,
'Grade': A },
'Course 2' : {'Course Name':'Science',
'Marks': 75,
'Grade': B-}},
's2' : {'Course 1' : {'Course Name':'English',
'Marks': 82,
'Grade': B},
'Course 2' : {'Course Name':'Maths',
'Marks': 90,
'Grade': A}}}
I need to convert it into a dataframe like below
Student Course 1 Course 2
Course Name Marks Grade Course Name Marks Grade
s1 Maths 95 A Science 75 B-
s2 English 82 B Maths 90 A
I have tired the following code from this answer
stud_df = pandas.DataFrame.from_dict(stud_data_dict, orient="index").stack().to_frame()
final_df = pandas.DataFrame(stud_df[0].values.tolist(), index=stud_df.index)
I am getting the dataframe like below
Course Name Marks Grade
s1 Course 1 Maths 95 A
Course 2 Science 75 B-
s2 Course 1 English 82 B
Course 2 Maths 90 A
This is the closest I got to the desired output. What changes do I need to make to the code to get the desired dataframe?
CodePudding user response:
Change dictionary
first and then pass to Series
with reshape by Series.unstack
:
#reformat nested dict
#https://stackoverflow.com/a/39807565/2901002
d = {(level1_key, level2_key, level3_key): values
for level1_key, level2_dict in stud_data_dict.items()
for level2_key, level3_dict in level2_dict.items()
for level3_key, values in level3_dict.items()}
stud_df = pd.Series(d).unstack([1,2])
print (stud_df)
Course 1 Course 2
Course Name Marks Grade Course Name Marks Grade
s1 Maths 95 A Science 75 B-
s2 English 82 B Maths 90 A
Another idea is created dictionary of tuples in keys with defaultdict
:
from collections import defaultdict
d = defaultdict(dict)
for k, v in stud_data_dict.items():
for k1, v1 in v.items():
for k2, v2 in v1.items():
d[(k1, k2)].update({k: v2})
df = pd.DataFrame(d)
print(df)
Course 1 Course 2
Course Name Marks Grade Course Name Marks Grade
s1 Maths 95 A Science 75 B-
s2 English 82 B Maths 90 A
CodePudding user response:
One option is to create data frames from the inner dictionaries, concatenate into a single frame, reshape and cleanup:
out = {key: pd.DataFrame.from_dict(value, orient='index')
for key, value in stud_data_dict.items()}
(pd
.concat(out)
.unstack()
.swaplevel(axis = 1)
.sort_index(axis = 1)
.rename_axis('Student')
.reset_index()
)
Student Course 1 Course 2
Course Name Grade Marks Course Name Grade Marks
0 s1 Maths A 95 Science B- 75
1 s2 English B 82 Maths A 90