import pandas as pd
from pandas import json_normalize
Suppose I have the following data:
data = [
{
'Name':'Rocco',
'Year':2020,
'Location':'Itay',
'Source':[
{'Movies':
{'MovieNumber':1, 'Money':1000, 'Percent':10}
}
]
},
{
'Name':'Anja',
'Year':2021,
'Location':'Germany',
'Source':[
{'Movies': [
{'MovieNumber':1, 'Money':2000, 'Percent':10},
{'MovieNumber':2, 'Money':3000, 'Percent':10}
]
}
]
},
{
'Name':'Kasia',
'Year':2021,
'Location':'Poland',
'Source':[
{'Movies': [
{'MovieNumber':1, 'Money':1000, 'Percent':10},
{'MovieNumber':2, 'Money':1000, 'Percent':10},
{'MovieNumber':3, 'Money':1000, 'Percent':10}
]
}
]
}
]
This is what it looks like as a pandas DF:
df = pd.DataFrame(data)
df
This was my first attempt at unpacking:
results = pd.json_normalize(
data,
record_path = ['Source'],
meta = ['Year']
)
This only did the first level so Rocco's information came out correct but Anja's and Kasia's did not because the data structure inside Source
changed to a list.
My next attempt involved the following but it too did not work:
results2 = pd.json_normalize(
data,
record_path = ['Source'],
meta = [
'Year',
['Source'],
]
)
From looking at the documentation and YouTube videos, I think I need another element in both the record_path
field and meta
field but I don't know what would go there. I've tried different combinations but none of them worked.
Does anyone know how I can unpack this? I would like the end result to be a DF with the following columns: Year, Name, Location, Movies.MovieNumber, Movies.Money, Movies.Percent. So, there should be a total of 6 rows.
CodePudding user response:
Fix your dictionary first, it's not consistent, this makes it consistent:
for i, x in enumerate(data):
x = x['Source'][0]['Movies']
if not isinstance(x, list):
data[i]['Source'][0]['Movies'] = [x]
Then json_normalize
works just fine:
df = pd.json_normalize(data, ['Source','Movies'], ['Name', 'Year', 'Location'])
print(df)
Output:
MovieNumber Money Percent Name Year Location
0 1 1000 10 Rocco 2020 Itay
1 1 2000 10 Anja 2021 Germany
2 2 3000 10 Anja 2021 Germany
3 1 1000 10 Kasia 2021 Poland
4 2 1000 10 Kasia 2021 Poland
5 3 1000 10 Kasia 2021 Poland