So from the pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html
data = [
{
"state": "Florida",
"shortname": "FL",
"info": {"governor": "Rick Scott"},
"counties": [
{"name": "Dade", "population": 12345},
{"name": "Broward", "population": 40000},
{"name": "Palm Beach", "population": 60000},
],
},
{
"state": "Ohio",
"shortname": "OH",
"info": {"governor": "John Kasich"},
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
},
]
result = pd.json_normalize(
data, "counties", ["state", "shortname", ["info", "governor"]]
)
result
name population state shortname info.governor
0 Dade 12345 Florida FL Rick Scott
1 Broward 40000 Florida FL Rick Scott
2 Palm Beach 60000 Florida FL Rick Scott
3 Summit 1234 Ohio OH John Kasich
4 Cuyahoga 1337 Ohio OH John Kasich
My question is what is the record_path argument doing exactly? From what I understand the record_path is specifying to a list of records, but then we are also getting records from state,shortname etc as features in the dataframe. So what is the difference between record_label and meta arguments?
P.S.I looked into other Stackoverflow posts but didnt quite understand.
Thanks for the help.
CodePudding user response:
record_path
specific the list of items to base the actual rows on.
As you can see below, when you omit the meta
(3rd argument), the rows are just the properties from the objects in the list you specified, counties
:
>>> pd.json_normalize(data, record_path="counties")
name population
0 Dade 12345
1 Broward 40000
2 Palm Beach 60000
3 Summit 1234
4 Cuyahoga 1337
Those correspond directly to object in countries
:
...
"counties": [
{"name": "Dade", "population": 12345},
{"name": "Broward", "population": 40000},
{"name": "Palm Beach", "population": 60000},
]
...
"counties": [
{"name": "Summit", "population": 1234},
{"name": "Cuyahoga", "population": 1337},
],
...
meta
specifies properties from the JSON you pass (data
) to add to the records specified by records_path
.