Home > Enterprise >  Write list of dictionaries to multiple rows in a Pandas Dataframe
Write list of dictionaries to multiple rows in a Pandas Dataframe

Time:10-13

So I have a list of dictionaries, that itself has lists of dictionaries within it like this:

myDict = [{'First_Name': 'Jack', 'Last_Name': 'Smith', 'Job_Data': [{'Company': 'Amazon'}, {'Hire_Date': '2011-04-01', 'Company': 'Target'}]}, 
 {'First_Name': 'Jill', 'Last_Name': 'Smith', 'Job_Data': [{'Hire_Date': '2009-11-16', 'Company': 'Sears'}, {'Hire_Date': '2011-04-01'}]}]

However, as you can see, some of the key values are the same, and sometimes data elements will be missing like Jack missing a Hire Date and Jill missing a company. So what I want to do is preserve the data and write it to multiple rows so that my final output looks like this:

      First_Name    Last_Name    Hire_Date     Company
0     Jack          Smith        NaN           Amazon  
1     Jack          Smith        2011-04-01    Target
2     Jill          Smith        2009-11-16    Sears
3     Jill          Smith        2011-04-01    NaN

Edit: Follow-up question. Say now that I have a dictionary that looks like this that adds in an extract key and I want to produce a similar output but with the new data included:

 myDict = [{'First_Name': 'Jack', 'Last_Name': 'Smith', 'Job_Data': [{'Company': 'Amazon'}, {'Hire_Date': '2011-04-01', 'Company': 'Target'}, 'Dependent_data': [{'Dependent': 'Susan Smith'}, {'Dependent': 'Will Smith'}]}, 
     {'First_Name': 'Jill', 'Last_Name': 'Smith', 'Job_Data': [{'Hire_Date': '2009-11-16', 'Company': 'Sears'}, {'Hire_Date': '2011-04-01'}]}]

Output:

      First_Name    Last_Name    Hire_Date     Company    Dependent
0     Jack          Smith        NaN           Amazon     Susan Smith  
1     Jack          Smith        2011-04-01    Target     Will Smith
2     Jill          Smith        2009-11-16    Sears      NaN
3     Jill          Smith        2011-04-01    NaN        NaN

CodePudding user response:

Using json_normalize

df = pd.json_normalize(data=myDict, meta=["First_Name", "Last_Name"], record_path="Job_Data")
print(df)

  Company   Hire_Date First_Name Last_Name
0  Amazon         NaN       Jack     Smith
1  Target  2011-04-01       Jack     Smith
2   Sears  2009-11-16       Jill     Smith
3     NaN  2011-04-01       Jill     Smith
  • Related