Home > Software design >  Having trouble understanding Json_normalize Record_path argument
Having trouble understanding Json_normalize Record_path argument

Time:03-13

So from the pandas documentation: https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

data = [
    {
        "state": "Florida",
        "shortname": "FL",
        "info": {"governor": "Rick Scott"},
        "counties": [
            {"name": "Dade", "population": 12345},
            {"name": "Broward", "population": 40000},
            {"name": "Palm Beach", "population": 60000},
        ],
    },
    {
        "state": "Ohio",
        "shortname": "OH",
        "info": {"governor": "John Kasich"},
        "counties": [
            {"name": "Summit", "population": 1234},
            {"name": "Cuyahoga", "population": 1337},
        ],
    },
]
result = pd.json_normalize(
    data, "counties", ["state", "shortname", ["info", "governor"]]
)
result
         name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich

My question is what is the record_path argument doing exactly? From what I understand the record_path is specifying to a list of records, but then we are also getting records from state,shortname etc as features in the dataframe. So what is the difference between record_label and meta arguments?

P.S.I looked into other Stackoverflow posts but didnt quite understand.

Thanks for the help.

CodePudding user response:

record_path specific the list of items to base the actual rows on.

As you can see below, when you omit the meta (3rd argument), the rows are just the properties from the objects in the list you specified, counties:

>>> pd.json_normalize(data, record_path="counties")
         name  population
0        Dade       12345
1     Broward       40000
2  Palm Beach       60000
3      Summit        1234
4    Cuyahoga        1337

Those correspond directly to object in countries:

...
"counties": [
    {"name": "Dade", "population": 12345},
    {"name": "Broward", "population": 40000},
    {"name": "Palm Beach", "population": 60000},
]
...
"counties": [
    {"name": "Summit", "population": 1234},
    {"name": "Cuyahoga", "population": 1337},
],
...

meta specifies properties from the JSON you pass (data) to add to the records specified by records_path.

  • Related