I have a dictionary like this:
dict = {Student_ID_1:{Course_1:[12,45,378], Course_2: [33,78,345]},
Student_ID_2:{Course_6:[15,25,48], Course_24: [31,38,342]},
....<truncated>.....}
I have thousands of Student_IDs & 50 course_IDs. Now I would like to create a dataframe from this dictionary in this format:
Student_ID Course_1_a Course_1_b Course_1_c ... Course_50_a Course_50_b Course_50_c
12855 12 35 234 21 55 342
How can I convert my dictionary into a dataframe in this format? I tried different ways but I could get only the first value in the course list into my dataframe columns. Can anyone help me with this?
CodePudding user response:
I hope I've understood your question right. You can preprocess the dictionary before creating the dataframe:
dct = {
"Student_ID_1": {"Course_1": [12, 45, 378], "Course_2": [33, 78, 345]},
"Student_ID_2": {"Course_6": [15, 25, 48], "Course_24": [31, 38, 342]},
}
dct = {
k: {f"{kk}_{b}": a for kk, vv in v.items() for a, b in zip(vv, "abc")}
for k, v in dct.items()
}
df = (
pd.DataFrame.from_dict(dct, orient="index")
.reset_index()
.rename(columns={"index": "Student ID"})
)
print(df)
Prints:
Student ID Course_1_a Course_1_b Course_1_c Course_2_a Course_2_b Course_2_c Course_6_a Course_6_b Course_6_c Course_24_a Course_24_b Course_24_c
0 Student_ID_1 12.0 45.0 378.0 33.0 78.0 345.0 NaN NaN NaN NaN NaN NaN
1 Student_ID_2 NaN NaN NaN NaN NaN NaN 15.0 25.0 48.0 31.0 38.0 342.0
CodePudding user response:
You could try df.explode()
to split the list in the dataframe
import pandas as pd
dict = {'Student_ID_1':{'Course_1':[12,45,378], 'Course_2': [33,78,345]},
'Student_ID_2':{'Course_6':[15,25,48], 'Course_24': [31,38,342]}}
df = pd.DataFrame(dict)
df1 = df.explode('Student_ID_1').explode('Student_ID_2')
print(df1)
Student_ID_1 Student_ID_2
Course_1 12 NaN
Course_1 45 NaN
Course_1 378 NaN
Course_2 33 NaN
Course_2 78 NaN
Course_2 345 NaN
Course_6 NaN 15
Course_6 NaN 25
Course_6 NaN 48
Course_24 NaN 31
Course_24 NaN 38
Course_24 NaN 342
After that, transpose the dataframe and rename the columns
df1 = df1.T
df1.columns = [col '_' s for col in df.T.columns for s in ['a', 'b', 'c'] ]
print(df1)
Output:
Course_1_a Course_1_b Course_1_c Course_2_a Course_2_b Course_2_c Course_6_a Course_6_b Course_6_c Course_24_a Course_24_b Course_24_c
Student_ID_1 12 45 378 33 78 345 NaN NaN NaN NaN NaN NaN
Student_ID_2 NaN NaN NaN NaN NaN NaN 15 25 48 31 38 342
CodePudding user response:
Here's a way to do what your question asks:
dct = {12855:{'Course_1':[12,45,378], 'Course_2': [33,78,345]},
12856:{'Course_6':[15,25,48], 'Course_24': [31,38,342]}}
df = pd.DataFrame(dct).apply(lambda x : [[None]*3 if y is np.NaN else y for y in x])
df = ( df
.assign(course=[[f'{c}_{letter}' for letter in 'abc'] for c in df.index])
.explode(['course'] list(df.columns))
.rename_axis('Student_ID', axis=1)
.set_index('course').rename_axis(None).T.reset_index() )
Output:
Student_ID Course_1_a Course_1_b Course_1_c Course_2_a Course_2_b Course_2_c Course_6_a Course_6_b Course_6_c Course_24_a Course_24_b Course_24_c
0 12855 12 45 378 33 78 345 None None None None None None
1 12856 None None None None None None 15 25 48 31 38 342
Explanation:
- Use
pd.DataFrame(dct)
to create a dataframe like this:
12855 12856
Course_1 [12, 45, 378] NaN
Course_2 [33, 78, 345] NaN
Course_6 NaN [15, 25, 48]
Course_24 NaN [31, 38, 342]
- Use
apply()
to convertNaN
values to a list with 3None
values like this:
12855 12856
Course_1 [12, 45, 378] [None, None, None]
Course_2 [33, 78, 345] [None, None, None]
Course_6 [None, None, None] [15, 25, 48]
Course_24 [None, None, None] [31, 38, 342]
- Use
assign()
to add a columncourse
with a list whose items are the original course name with_a
,_b
, and_c
appended like this:
12855 12856 course
Course_1 [12, 45, 378] [None, None, None] [Course_1_a, Course_1_b, Course_1_c]
Course_2 [33, 78, 345] [None, None, None] [Course_2_a, Course_2_b, Course_2_c]
Course_6 [None, None, None] [15, 25, 48] [Course_6_a, Course_6_b, Course_6_c]
Course_24 [None, None, None] [31, 38, 342] [Course_24_a, Course_24_b, Course_24_c]
- Use
explode()
to turn each row into 3 rows, one for each successive list item in the row's respective columns like this:
12855 12856 course
Course_1 12 None Course_1_a
Course_1 45 None Course_1_b
Course_1 378 None Course_1_c
Course_2 33 None Course_2_a
Course_2 78 None Course_2_b
Course_2 345 None Course_2_c
Course_6 None 15 Course_6_a
Course_6 None 25 Course_6_b
Course_6 None 48 Course_6_c
Course_24 None 31 Course_24_a
Course_24 None 38 Course_24_b
Course_24 None 342 Course_24_c
- Use
rename_axis()
to name the column indexStudent_ID
- Use
set_index()
to replace the index with columncourse
and userename_axis()
to change the index name toNone
- Use
.T
to transpose and use.reset_index()
to change theStudent_ID
index to a column, getting theOutput
shown above.