Home > Software engineering >  Converted list of multi nested dictionaries to dataframe
Converted list of multi nested dictionaries to dataframe

Time:02-15

Tryng to unpack (what I believe to be called) a list of multi-nested dictionaries to achieve the following output:

   subject_code               module                                              notes        topics       start         end faculty_id faculty_leader
0            90  ABC Board Spring 19  refer to guidance: https://nothing.com/general...  [1A, 2G, 3S]  1643847382  1644279382    nothing     Steve Finn
1            80  ABC Board Spring 18  refer to guidance: https://nothing.com/general...  [1D, 5F, 2B]  1643847382  1644279382    nothing     Steve Finn

Below is the raw data and my unsuccessful attempts to achieve the aforementioned output:

import pandas as pd

data_ = [
         {
          "subject_code": 90, "module": "ABC Board Spring 19", 
          "notes" : "refer to guidance: https://nothing.com/general-information/overleaf", 
          "topics" : ['1A', '2G', '3S'], "start":1643847382,"end":1644279382, 
          "faculty":{"id":"nothing","leader":"Steve Finn"}
           },
         {
          "subject_code": 80, "module": "ABC Board Spring 18", 
          "notes" : "refer to guidance: https://nothing.com/general-information/overleaf", 
          "topics" : ['1D', '5F', '2B'], "start":1643847382,"end":1644279382, 
          "faculty":{"id":"nothing","leader":"Steve Finn"}}]

df = pd.DataFrame(data_ )
print(df)

However, the final column faculty is not displaying as desired....

   subject_code               module                                              notes        topics       start         end                                    faculty
0            90  ABC Board Spring 19  refer to guidance: https://nothing.com/general...  [1A, 2G, 3S]  1643847382  1644279382  {'id': 'nothing', 'leader': 'Steve Finn'}
1            80  ABC Board Spring 18  refer to guidance: https://nothing.com/general...  [1D, 5F, 2B]  1643847382  1644279382  {'id': 'nothing', 'leader': 'Steve Finn'}

Rather than faculty displaying as a single column containing a dictionary of data I would like split into individual columns, named as dictionary name and key...

Below is the desired output as shown at the start of this thread......

   subject_code               module                                              notes        topics       start         end faculty_id faculty_leader
0            90  ABC Board Spring 19  refer to guidance: https://nothing.com/general...  [1A, 2G, 3S]  1643847382  1644279382    nothing     Steve Finn
1            80  ABC Board Spring 18  refer to guidance: https://nothing.com/general...  [1D, 5F, 2B]  1643847382  1644279382    nothing     Steve Finn

CodePudding user response:

You could use the DataFrame constructor on "faculty" column and join the resulting DataFrame back to df (with add_prefix, you could add "faculty"):

out = df.join(pd.DataFrame(df['faculty'].tolist()).add_prefix('faculty_')).drop(columns='faculty')

Output:

   subject_code               module  \
0            90  ABC Board Spring 19   
1            80  ABC Board Spring 18   

                                               notes        topics  \
0  refer to guidance: https://nothing.com/general...  [1A, 2G, 3S]   
1  refer to guidance: https://nothing.com/general...  [1D, 5F, 2B]   

        start         end faculty_id faculty_leader  
0  1643847382  1644279382    nothing     Steve Finn  
1  1643847382  1644279382    nothing     Steve Finn  

CodePudding user response:

You can just use json_normalize. Renaming the columns to the desired names is an exercise I leave for the reader.

pd.json_normalize(data_)

   subject_code               module                                              notes        topics       start         end faculty.id faculty.leader
0            90  ABC Board Spring 19  refer to guidance: https://nothing.com/general...  [1A, 2G, 3S]  1643847382  1644279382    nothing     Steve Finn
1            80  ABC Board Spring 18  refer to guidance: https://nothing.com/general...  [1D, 5F, 2B]  1643847382  1644279382    nothing     Steve Finn

CodePudding user response:

You can flatten "faculty" before passing to the DataFrame constructor

For example:

for d in data_:
    faculty = d.pop("faculty")
    d["faculty_id"] = faculty["id"]
    d["faculty_leader"] = faculty["leader"]

then pass data_ to the DataFrame constructor

  • Related