I am working with a .json file from a public API (https://agsi.gie.eu/, for the sake of completeness) but having troubles flattening it using pandas.
The .json I am working with looks like this (excerpt):
json = {'Austria': [{'image': 'iVBORw0KxuOmB00wGjmw4Y3XTA=',
'short_name': 'astora',
'name': 'astora GmbH',
'url': 'http://www.astora.de',
'eic': '21X000000001160J',
'facilities': [{'eic': '21W000000000078N',
'name': 'UGS Haidach (astora)',
'country': {'code': 'AT', 'name': 'Austria'},
'type': 'Storage Facility'}],
'data': {'type': 'SSO',
'country': {'code': 'AT', 'name': 'Austria'},
'code': 'EU',
'name': 'Europe'}},
{'image': 'iVBORw0KGgoAAAANSUhEUgAAAIwA',
'short_name': 'GSA',
'name': 'GSA LLC',
'url': 'http://www.gsa-services.ru/',
'eic': '25X-GSALLC-----E',
'facilities': [{'eic': '25W-SPHAID-GAZ-M',
'name': 'UGS Haidach (GSA)',
'country': {'code': 'AT', 'name': 'Austria'},
'type': 'Storage Facility'}],
'data': {'type': 'SSO',
'country': {'code': 'AT', 'name': 'Austria'},
'code': 'EU',
'name': 'Europe'}},
{'image': 'iVBO7OBOjcRx8LfUS fgE/eevQiV /D85wMd0sggG8GgAAAABJRU5ErkJggg==',
'short_name': 'OMV Gas Storage',
'name': 'OMV Gas Storage GmbH',
'url': 'http://www.omv-gas-storage.com',
'eic': '25X-OMVGASSTORA5',
'facilities': [{'eic': '21W000000000081Y',
'name': 'VGS OMV Gas Storage Pool (Tallesbrunn & Schönkirchen / Reyersdorf)',
'country': {'code': 'AT', 'name': 'Austria'},
'type': 'Storage Group'}],
'data': {'type': 'SSO',
'country': {'code': 'AT', 'name': 'Austria'},
'code': 'EU',
'name': 'Europe'}},
{'image': 'iVBORw0WjUMEoVDAKFYxCoYJRqGAU/y/ EWAAX72IOPqJ0L0AAAAASUVORK5CYII=',
'short_name': 'RAG Energy Storage',
'name': 'RAG Energy Storage',
'url': 'http://www.rag-energy-storage.at/',
'eic': '23X----100225-1C',
'facilities': [{'eic': '21W000000000079L',
'name': 'RAG Storage Pool (Puchkirchen / Haag, Aigelsbrunn, Haidach 5, 7Fields-RAG)',
'country': {'code': 'AT', 'name': 'Austria'},
'type': 'Storage Group'}],
'data': {'type': 'SSO',
'country': {'code': 'AT', 'name': 'Austria'},
'code': 'EU',
'name': 'Europe'}},
{'image': 'iVBORw0KGgoAAAANXqi7W2jLByR/ToOH22',
'short_name': 'Uniper Energy Storage (AT)',
'name': 'Uniper Energy Storage GmbH',
'url': 'https://www.uniper-energy-storage.com',
'eic': '21X000000001127H',
'facilities': [{'eic': '21W000000000057V',
'name': 'UGS 7 Fields (Uniper Energy Storage) - AT',
'country': {'code': 'AT', 'name': 'Austria'},
'type': 'Storage Facility'}],
'data': {'type': 'SSO',
'country': {'code': 'AT', 'name': 'Austria'},
'code': 'EU',
'name': 'Europe'}}]}
and I am trying to flatten it, such that it looks like this: This is my desired outcome, flattened such that each facility is a row (other country data may have multiple facilities per company name), with the company data added as meta data.
I achieved this up to now by utilizing:
Austria = pd.json_normalize(json, record_path="facilities", meta=["name", "type", "eic"], meta_prefix='company_').drop(["url"], axis=1)
Since some days, this line returns me a KeyError: 'facilities'
KeyError: "Key 'facilities' not found. If specifying a record_path, all elements of data should have the path."
Although the key 'facilities' is in all elements of the data specified. What am I missing?
Any help and hints highly appreciated!
CodePudding user response:
The code below should work for you. Notice the commented changes.
Austria = pd.json_normalize(
json['Austria'], # <- Changed here from just `json`
record_path="facilities",
meta=["name", "type", "eic"],
meta_prefix='company_',
errors='ignore', # <- Add this to ignore errors
).drop(
["url"],
axis=1,
errors='ignore', # <- Add this to ignore errors
)
And this is the result:
eic | name | type | country.code | country.name | company_name | company_type | company_eic | |
---|---|---|---|---|---|---|---|---|
0 | 21W000000000078N | UGS Haidach (astora) | Storage Facility | AT | Austria | astora GmbH | nan | 21X000000001160J |
1 | 25W-SPHAID-GAZ-M | UGS Haidach (GSA) | Storage Facility | AT | Austria | GSA LLC | nan | 25X-GSALLC-----E |
2 | 21W000000000081Y | VGS OMV Gas Storage Pool (Tallesbrunn & Schönkirchen / Reyersdorf) | Storage Group | AT | Austria | OMV Gas Storage GmbH | nan | 25X-OMVGASSTORA5 |
3 | 21W000000000079L | RAG Storage Pool (Puchkirchen / Haag, Aigelsbrunn, Haidach 5, 7Fields-RAG) | Storage Group | AT | Austria | RAG Energy Storage | nan | 23X----100225-1C |
4 | 21W000000000057V | UGS 7 Fields (Uniper Energy Storage) - AT | Storage Facility | AT | Austria | Uniper Energy Storage GmbH | nan | 21X000000001127H |