Home > Software engineering >  python pandas json_normalize - How to flatten nested dict - key error in record path
python pandas json_normalize - How to flatten nested dict - key error in record path

Time:06-17

I am working with a .json file from a public API (https://agsi.gie.eu/, for the sake of completeness) but having troubles flattening it using pandas.

The .json I am working with looks like this (excerpt):

json = {'Austria': [{'image': 'iVBORw0KxuOmB00wGjmw4Y3XTA=',
   'short_name': 'astora',
   'name': 'astora GmbH',
   'url': 'http://www.astora.de',
   'eic': '21X000000001160J',
   'facilities': [{'eic': '21W000000000078N',
     'name': 'UGS Haidach (astora)',
     'country': {'code': 'AT', 'name': 'Austria'},
     'type': 'Storage Facility'}],
   'data': {'type': 'SSO',
    'country': {'code': 'AT', 'name': 'Austria'},
    'code': 'EU',
    'name': 'Europe'}},
  {'image': 'iVBORw0KGgoAAAANSUhEUgAAAIwA',
   'short_name': 'GSA',
   'name': 'GSA LLC',
   'url': 'http://www.gsa-services.ru/',
   'eic': '25X-GSALLC-----E',
   'facilities': [{'eic': '25W-SPHAID-GAZ-M',
     'name': 'UGS Haidach (GSA)',
     'country': {'code': 'AT', 'name': 'Austria'},
     'type': 'Storage Facility'}],
   'data': {'type': 'SSO',
    'country': {'code': 'AT', 'name': 'Austria'},
    'code': 'EU',
    'name': 'Europe'}},
  {'image': 'iVBO7OBOjcRx8LfUS fgE/eevQiV /D85wMd0sggG8GgAAAABJRU5ErkJggg==',
   'short_name': 'OMV Gas Storage',
   'name': 'OMV Gas Storage GmbH',
   'url': 'http://www.omv-gas-storage.com',
   'eic': '25X-OMVGASSTORA5',
   'facilities': [{'eic': '21W000000000081Y',
     'name': 'VGS OMV Gas Storage Pool (Tallesbrunn & Schönkirchen / Reyersdorf)',
     'country': {'code': 'AT', 'name': 'Austria'},
     'type': 'Storage Group'}],
   'data': {'type': 'SSO',
    'country': {'code': 'AT', 'name': 'Austria'},
    'code': 'EU',
    'name': 'Europe'}},
  {'image': 'iVBORw0WjUMEoVDAKFYxCoYJRqGAU/y/ EWAAX72IOPqJ0L0AAAAASUVORK5CYII=',
   'short_name': 'RAG Energy Storage',
   'name': 'RAG Energy Storage',
   'url': 'http://www.rag-energy-storage.at/',
   'eic': '23X----100225-1C',
   'facilities': [{'eic': '21W000000000079L',
     'name': 'RAG Storage Pool (Puchkirchen / Haag, Aigelsbrunn, Haidach 5, 7Fields-RAG)',
     'country': {'code': 'AT', 'name': 'Austria'},
     'type': 'Storage Group'}],
   'data': {'type': 'SSO',
    'country': {'code': 'AT', 'name': 'Austria'},
    'code': 'EU',
    'name': 'Europe'}},
  {'image': 'iVBORw0KGgoAAAANXqi7W2jLByR/ToOH22',
   'short_name': 'Uniper Energy Storage (AT)',
   'name': 'Uniper Energy Storage GmbH',
   'url': 'https://www.uniper-energy-storage.com',
   'eic': '21X000000001127H',
   'facilities': [{'eic': '21W000000000057V',
     'name': 'UGS 7 Fields (Uniper Energy Storage) - AT',
     'country': {'code': 'AT', 'name': 'Austria'},
     'type': 'Storage Facility'}],
   'data': {'type': 'SSO',
    'country': {'code': 'AT', 'name': 'Austria'},
    'code': 'EU',
    'name': 'Europe'}}]} 

and I am trying to flatten it, such that it looks like this: This is my desired outcome, flattened such that each facility is a row (other country data may have multiple facilities per company name), with the company data added as meta data.

I achieved this up to now by utilizing:

Austria = pd.json_normalize(json, record_path="facilities", meta=["name", "type", "eic"], meta_prefix='company_').drop(["url"], axis=1)

Since some days, this line returns me a KeyError: 'facilities'

KeyError: "Key 'facilities' not found. If specifying a record_path, all elements of data should have the path."

Although the key 'facilities' is in all elements of the data specified. What am I missing?

Any help and hints highly appreciated!

CodePudding user response:

The code below should work for you. Notice the commented changes.

Austria = pd.json_normalize(
    json['Austria'],              # <- Changed here from just `json`
    record_path="facilities",
    meta=["name", "type", "eic"],
    meta_prefix='company_',
    errors='ignore',              # <- Add this to ignore errors
).drop(
    ["url"],
    axis=1,
    errors='ignore',              # <- Add this to ignore errors
)

And this is the result:

eic name type country.code country.name company_name company_type company_eic
0 21W000000000078N UGS Haidach (astora) Storage Facility AT Austria astora GmbH nan 21X000000001160J
1 25W-SPHAID-GAZ-M UGS Haidach (GSA) Storage Facility AT Austria GSA LLC nan 25X-GSALLC-----E
2 21W000000000081Y VGS OMV Gas Storage Pool (Tallesbrunn & Schönkirchen / Reyersdorf) Storage Group AT Austria OMV Gas Storage GmbH nan 25X-OMVGASSTORA5
3 21W000000000079L RAG Storage Pool (Puchkirchen / Haag, Aigelsbrunn, Haidach 5, 7Fields-RAG) Storage Group AT Austria RAG Energy Storage nan 23X----100225-1C
4 21W000000000057V UGS 7 Fields (Uniper Energy Storage) - AT Storage Facility AT Austria Uniper Energy Storage GmbH nan 21X000000001127H
  • Related