So I wanted to flatten nested json data in this pandas data frame as additional columns. I am using requests to get json data from an api and create a pandas data frame.
import requests import pandas as pd import json
url = 'https://api.openaq.org/v2/locations?coordinates=38.7437396,-9.2302436&radius=5000'
api_request = requests.get(url)
api_content = api_request.content
api_json = json.loads(api_content)
df = pd.json_normalize(api_json)
df
This outputs the following:
results meta.name meta.license meta.website meta.page meta.limit meta.found
0 [{'id': 8209, 'city': 'Lisboa', 'name': 'PT031... openaq-api CC BY 4.0d api.openaq.org 1 100 4
so, I did
df = pd.json_normalize(api_json, record_path=["results"])
Which gives a nicer result:
id city name entity country sources isMobile isAnalysis parameters sensorType lastUpdated firstUpdated measurements bounds manufacturers coordinates.latitude coordinates.longitude
0 8209 Lisboa PT03100 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25414, 'unit': 'µg/m³', 'count': 10005... reference grade 2022-08-27T16:00:00 00:00 2017-09-22T11:00:00 00:00 246216 None None 38.748056 -9.202500
1 8211 Lisboa PT03084 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25397, 'unit': 'µg/m³', 'count': 93281... reference grade 2022-08-27T16:00:00 00:00 2017-09-22T10:00:00 00:00 239733 None None 38.754167 -9.230833
2 7199 Lisboa PT03087 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 24800, 'unit': 'µg/m³', 'count': 97702... reference grade 2022-08-27T16:00:00 00:00 2017-09-22T13:00:00 00:00 235512 None None 38.705000 -9.210278
3 7767 Lisboa PT03082 government PT [{'url': 'http://www.eea.europa.eu/themes/air/... False False [{'id': 25393, 'unit': 'µg/m³', 'count': 83437... reference grade 2022-08-27T16:00:00 00:00 2018-02-27T01:00:00 00:00 304693 None None 38.738889 -9.207500
However, the 'sources' and 'parameter' columns are still in json format. How can I unpack these into new columns?
Also, can I for instance just pull the 'url' key out of the 'sources' json data and make it its own column?
CodePudding user response:
You can do like this. Although there are so many columns you might want to prune things up a bit before merging into main df.
df_sources = pd.json_normalize(df.sources.explode())
df_params = pd.json_normalize(df.parameters.explode())
df = pd.concat([df, df_sources, df_params], axis=1)
del df['sources'], df['parameters']
CodePudding user response:
This json is quite complicated, and is more so multiple dataframes in one.
We can differentiate between the inner frames and the metadata like so:
result1 = api_json['results'][0]
# It's a frame if it's a list
frames = [x for x, y in result1.items() if isinstance(y, list)]
# Otherwise it's just metadata
metadata = [x for x in result1 if x not in frames]
Now, let's make a list of dataframes, I'll remove null columns as well:
dfs = [pd.json_normalize(api_json['results'], frame, metadata, record_prefix=frame '_').dropna(how='all', axis=1) for frame in frames]
Fix up that coordinates column:
dfs = [pd.concat([df, df.coordinates.apply(pd.Series)], axis=1).drop('coordinates', axis=1) for df in dfs]
metadata = ['latitude', 'longitude']
And concat the dfs together, using the metadata as the index:
df = pd.concat([df.set_index([x for x in df.columns if x in metadata]) for df in dfs], axis=1)
print(df)
# Output:
sources_url sources_name sources_id parameters_id parameters_unit parameters_count parameters_average parameters_lastValue parameters_parameter parameters_displayName parameters_lastUpdated parameters_parameterId parameters_firstUpdated
id city name entity country isMobile isAnalysis sensorType lastUpdated firstUpdated measurements latitude longitude
7199 Lisboa PT03087 government PT False False reference grade 2022-08-27 18:00:00 00:00 2017-09-22 13:00:00 00:00 235524 38.705000 -9.210278 http://www.eea.europa.eu/themes/air/air-quality EEA eea 20728 µg/m³ 38829 0.000000 0.0 pm10 PM10 2019-12-24 16:00:00 00:00 1 2018-03-22 16:00:00 00:00
-9.210278 http://www.eea.europa.eu/themes/air/air-quality EEA eea 24802 µg/m³ 98987 53.124865 58.0 o3 O₃ mass 2022-08-27 18:00:00 00:00 3 2017-09-22 13:00:00 00:00
-9.210278 http://www.eea.europa.eu/themes/air/air-quality EEA eea 24800 µg/m³ 97708 18.870253 5.0 no2 NO₂ mass 2022-08-27 18:00:00 00:00 5 2017-09-22 13:00:00 00:00
7767 Lisboa PT03082 government PT False False reference grade 2022-08-27 18:00:00 00:00 2018-02-27 01:00:00 00:00 304711 38.738889 -9.207500 http://www.eea.europa.eu/themes/air/air-quality EEA eea 25392 µg/m³ 85382 24.400488 12.7 no2 NO₂ mass 2022-08-27 18:00:00 00:00 5 2018-03-22 12:00:00 00:00
-9.207500 http://www.eea.europa.eu/themes/air/air-quality EEA eea 22595 µg/m³ 68071 3.050738 5.0 so2 SO₂ mass 2022-08-27 18:00:00 00:00 6 2018-03-22 12:00:00 00:00
-9.207500 http://www.eea.europa.eu/themes/air/air-quality EEA eea 25391 µg/m³ 67821 3528.784949 1221.0 co CO mass 2022-08-27 18:00:00 00:00 4 2018-02-27 01:00:00 00:00
-9.207500 http://www.eea.europa.eu/themes/air/air-quality EEA eea 25393 µg/m³ 83437 44.678977 67.4 o3 O₃ mass 2022-07-25 14:00:00 00:00 3 2018-03-06 18:00:00 00:00
8209 Lisboa PT03100 government PT False False reference grade 2022-08-27 18:00:00 00:00 2017-09-22 11:00:00 00:00 246228 38.748056 -9.202500 http://www.eea.europa.eu/themes/air/air-quality EEA eea 23940 µg/m³ 47279 6.259864 8.8 pm10 PM10 2022-08-27 12:00:00 00:00 1 2017-09-22 11:00:00 00:00
-9.202500 http://www.eea.europa.eu/themes/air/air-quality EEA eea 24099 µg/m³ 98890 298.816190 197.0 co CO mass 2022-08-27 18:00:00 00:00 4 2017-09-22 11:00:00 00:00
-9.202500 http://www.eea.europa.eu/themes/air/air-quality EEA eea 25414 µg/m³ 100059 32.176792 29.6 no2 NO₂ mass 2022-08-27 18:00:00 00:00 5 2017-09-22 11:00:00 00:00
8211 Lisboa PT03084 government PT False False reference grade 2022-08-27 18:00:00 00:00 2017-09-22 10:00:00 00:00 239745 38.754167 -9.230833 http://www.eea.europa.eu/themes/air/air-quality EEA eea 25398 µg/m³ 99881 57.904319 94.0 o3 O₃ mass 2022-08-27 18:00:00 00:00 3 2017-09-22 11:00:00 00:00
-9.230833 http://www.eea.europa.eu/themes/air/air-quality EEA eea 23942 µg/m³ 46577 3.805825 10.0 pm10 PM10 2022-08-27 14:00:00 00:00 1 2017-09-22 10:00:00 00:00
-9.230833 http://www.eea.europa.eu/themes/air/air-quality EEA eea 25397 µg/m³ 93287 16.333777 10.9 no2 NO₂ mass 2022-08-27 18:00:00 00:00 5 2017-09-22 11:00:00 00:00