So I have a list of dictionaries, that itself has lists of dictionaries within it like this:
myDict = [{'First_Name': 'Jack', 'Last_Name': 'Smith', 'Job_Data': [{'Company': 'Amazon'}, {'Hire_Date': '2011-04-01', 'Company': 'Target'}]},
{'First_Name': 'Jill', 'Last_Name': 'Smith', 'Job_Data': [{'Hire_Date': '2009-11-16', 'Company': 'Sears'}, {'Hire_Date': '2011-04-01'}]}]
However, as you can see, some of the key values are the same, and sometimes data elements will be missing like Jack missing a Hire Date and Jill missing a company. So what I want to do is preserve the data and write it to multiple rows so that my final output looks like this:
First_Name Last_Name Hire_Date Company
0 Jack Smith NaN Amazon
1 Jack Smith 2011-04-01 Target
2 Jill Smith 2009-11-16 Sears
3 Jill Smith 2011-04-01 NaN
Edit: Follow-up question. Say now that I have a dictionary that looks like this that adds in an extract key and I want to produce a similar output but with the new data included:
myDict = [{'First_Name': 'Jack', 'Last_Name': 'Smith', 'Job_Data': [{'Company': 'Amazon'}, {'Hire_Date': '2011-04-01', 'Company': 'Target'}, 'Dependent_data': [{'Dependent': 'Susan Smith'}, {'Dependent': 'Will Smith'}]},
{'First_Name': 'Jill', 'Last_Name': 'Smith', 'Job_Data': [{'Hire_Date': '2009-11-16', 'Company': 'Sears'}, {'Hire_Date': '2011-04-01'}]}]
Output:
First_Name Last_Name Hire_Date Company Dependent
0 Jack Smith NaN Amazon Susan Smith
1 Jack Smith 2011-04-01 Target Will Smith
2 Jill Smith 2009-11-16 Sears NaN
3 Jill Smith 2011-04-01 NaN NaN
CodePudding user response:
Using json_normalize
df = pd.json_normalize(data=myDict, meta=["First_Name", "Last_Name"], record_path="Job_Data")
print(df)
Company Hire_Date First_Name Last_Name
0 Amazon NaN Jack Smith
1 Target 2011-04-01 Jack Smith
2 Sears 2009-11-16 Jill Smith
3 NaN 2011-04-01 Jill Smith