We have following list -
users_info = [
{
"Id": "21",
"Name": "ABC",
"Country": {
"Name": "Country 1"
},
"Addresses": {
"records": [
{
"addressId": "12",
"line1": "xyz, 102",
"city": "PQR"
},
{
"addressId": "13",
"line1": "YTR, 102",
"city": "NMS"
}
]
},
"Education": {
"records": [
{
"Id": "45",
"Degree": "Bachelors"
},
{
"Id": "49",
"Degree": "Masters"
}
]
}
},
{
"Id": "26",
"Name": "PEW",
"Country": {
"Name": "Country 2"
},
"Addresses": {
"records": [
{
"addressId": "10",
"line1": "BTR, 12",
"city": "UYT"
},
{
"addressId": "123",
"line1": "MEQW, 6",
"city": "KJH"
}
]
},
"Education": {
"records": [
{
"Id": "45",
"Degree": "Bachelors"
},
{
"Id": "49",
"Degree": "Masters"
}
]
}
},
{
"Id": "214",
"Name": "TUF",
"Country": None,
"Addresses": {
"records": None
},
"Education": {
"records": [
{
"Id": "45",
"Degree": "Bachelors"
},
{
"Id": "49",
"Degree": "Masters"
}
]
}
},
{
"Id": "2609",
"Name": "JJU",
"Country": {
"Name": "Country 2"
},
"Addresses": {
"records": [
{
"addressId": "10",
"line1": "BTR, 12",
"city": "UYT"
},
{
"addressId": "123",
"line1": "MEQW, 6",
"city": "KJH"
}
]
},
"Education": None
}
]
I want to flatten this list of dictionaries and create pandas dataframe -
Id | Name | Country.Name | Addresses.addressId | Addresses.line1 | Addresses.city | Education.Id | Education.Degree
There are two list of dictionaries - Addresses and Education. There is possibilty that any of these (Country, Addresses and Education) can be None
How do I create dataframe using above data to get the required format?
Here's what I have tried -
dataset_1 = pandas.json_normalize([user for user in users_info],
record_path = ["Addresses"],
meta = ["Id", "Name", ["Country", "Name"]],
errors="ignore"
)
dataset_2 = pandas.json_normalize([user for user in users_info],
record_path = ["Education"],
meta = ["Id", "Name", ["Country", "Name"]],
errors="ignore"
)
dataset = pandas.concat([dataset_1, dataset_2], axis=1)
When dataset_1 is executing, I am getting -
NoneType object is not iterable error
CodePudding user response:
You can use:
df = pd.json_normalize(users_info)
df_addresses = df['Addresses.records'].explode().apply(pd.Series)
df_addresses.rename(columns={col:f'Addresses.{col}' for col in df_addresses.columns}, inplace=True)
df_education = df['Education.records'].explode().apply(pd.Series)
df_education.rename(columns={col:f'Education.{col}' for col in df_education.columns}, inplace=True)
cols = [col for col in df.columns if col not in ['Addresses.records', 'Education.records']]
df = df[cols].join(df_addresses).join(df_education)
df.dropna(axis=1, how='all', inplace=True)
print(df)
OUTPUT
Id Name Country.Name Addresses.addressId Addresses.line1 Addresses.city Education.Degree Education.Id
0 21 ABC Country 1 12 xyz, 102 PQR Bachelors 45
0 21 ABC Country 1 12 xyz, 102 PQR Masters 49
0 21 ABC Country 1 13 YTR, 102 NMS Bachelors 45
0 21 ABC Country 1 13 YTR, 102 NMS Masters 49
1 26 PEW Country 2 10 BTR, 12 UYT Bachelors 45
1 26 PEW Country 2 10 BTR, 12 UYT Masters 49
1 26 PEW Country 2 123 MEQW, 6 KJH Bachelors 45
1 26 PEW Country 2 123 MEQW, 6 KJH Masters 49
2 214 TUF NaN NaN NaN NaN Bachelors 45
2 214 TUF NaN NaN NaN NaN Masters 49
3 2609 JJU Country 2 10 BTR, 12 UYT NaN NaN
3 2609 JJU Country 2 123 MEQW, 6 KJH NaN NaN