Home > Software design >  How to flatten nested json data using pandas
How to flatten nested json data using pandas

Time:08-28

So I wanted to flatten nested json data in this pandas data frame as additional columns. I am using requests to get json data from an api and create a pandas data frame.

import requests import pandas as pd import json

url = 'https://api.openaq.org/v2/locations?coordinates=38.7437396,-9.2302436&radius=5000'
api_request = requests.get(url)
api_content = api_request.content
api_json = json.loads(api_content)
df = pd.json_normalize(api_json)
df

This outputs the following:

results meta.name   meta.license    meta.website    meta.page   meta.limit  meta.found
0   [{'id': 8209, 'city': 'Lisboa', 'name': 'PT031...   openaq-api  CC BY 4.0d  api.openaq.org  1   100 4

so, I did

df = pd.json_normalize(api_json, record_path=["results"])

Which gives a nicer result:

    id  city    name    entity  country sources isMobile    isAnalysis  parameters  sensorType  lastUpdated firstUpdated    measurements    bounds  manufacturers   coordinates.latitude    coordinates.longitude
0   8209    Lisboa  PT03100 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 25414, 'unit': 'µg/m³', 'count': 10005...   reference grade 2022-08-27T16:00:00 00:00   2017-09-22T11:00:00 00:00   246216  None    None    38.748056   -9.202500
1   8211    Lisboa  PT03084 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 25397, 'unit': 'µg/m³', 'count': 93281...   reference grade 2022-08-27T16:00:00 00:00   2017-09-22T10:00:00 00:00   239733  None    None    38.754167   -9.230833
2   7199    Lisboa  PT03087 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 24800, 'unit': 'µg/m³', 'count': 97702...   reference grade 2022-08-27T16:00:00 00:00   2017-09-22T13:00:00 00:00   235512  None    None    38.705000   -9.210278
3   7767    Lisboa  PT03082 government  PT  [{'url': 'http://www.eea.europa.eu/themes/air/...   False   False   [{'id': 25393, 'unit': 'µg/m³', 'count': 83437...   reference grade 2022-08-27T16:00:00 00:00   2018-02-27T01:00:00 00:00   304693  None    None    38.738889   -9.207500

However, the 'sources' and 'parameter' columns are still in json format. How can I unpack these into new columns?

Also, can I for instance just pull the 'url' key out of the 'sources' json data and make it its own column?

CodePudding user response:

You can do like this. Although there are so many columns you might want to prune things up a bit before merging into main df.

df_sources = pd.json_normalize(df.sources.explode())
df_params = pd.json_normalize(df.parameters.explode())

df = pd.concat([df, df_sources, df_params], axis=1)
del df['sources'], df['parameters']

CodePudding user response:

This json is quite complicated, and is more so multiple dataframes in one.

We can differentiate between the inner frames and the metadata like so:

result1 = api_json['results'][0]
# It's a frame if it's a list
frames = [x for x, y in result1.items() if isinstance(y, list)]
# Otherwise it's just metadata
metadata = [x for x in result1 if x not in frames]

Now, let's make a list of dataframes, I'll remove null columns as well:

dfs = [pd.json_normalize(api_json['results'], frame, metadata, record_prefix=frame   '_').dropna(how='all', axis=1) for frame in frames]

Fix up that coordinates column:

dfs = [pd.concat([df, df.coordinates.apply(pd.Series)], axis=1).drop('coordinates', axis=1) for df in dfs]
metadata  = ['latitude', 'longitude']

And concat the dfs together, using the metadata as the index:

df = pd.concat([df.set_index([x for x in df.columns if x in metadata]) for df in dfs], axis=1)
print(df)

# Output:
                                                                                                                                                                                                     sources_url sources_name sources_id  parameters_id parameters_unit  parameters_count  parameters_average  parameters_lastValue parameters_parameter parameters_displayName    parameters_lastUpdated  parameters_parameterId   parameters_firstUpdated
id   city   name    entity     country isMobile isAnalysis sensorType      lastUpdated               firstUpdated              measurements latitude  longitude
7199 Lisboa PT03087 government PT      False    False      reference grade 2022-08-27 18:00:00 00:00 2017-09-22 13:00:00 00:00 235524       38.705000 -9.210278  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          20728           µg/m³             38829            0.000000                   0.0                 pm10                   PM10 2019-12-24 16:00:00 00:00                       1 2018-03-22 16:00:00 00:00
                                                                                                                                                      -9.210278  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          24802           µg/m³             98987           53.124865                  58.0                   o3                O₃ mass 2022-08-27 18:00:00 00:00                       3 2017-09-22 13:00:00 00:00
                                                                                                                                                      -9.210278  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          24800           µg/m³             97708           18.870253                   5.0                  no2               NO₂ mass 2022-08-27 18:00:00 00:00                       5 2017-09-22 13:00:00 00:00
7767 Lisboa PT03082 government PT      False    False      reference grade 2022-08-27 18:00:00 00:00 2018-02-27 01:00:00 00:00 304711       38.738889 -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25392           µg/m³             85382           24.400488                  12.7                  no2               NO₂ mass 2022-08-27 18:00:00 00:00                       5 2018-03-22 12:00:00 00:00
                                                                                                                                                      -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          22595           µg/m³             68071            3.050738                   5.0                  so2               SO₂ mass 2022-08-27 18:00:00 00:00                       6 2018-03-22 12:00:00 00:00
                                                                                                                                                      -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25391           µg/m³             67821         3528.784949                1221.0                   co                CO mass 2022-08-27 18:00:00 00:00                       4 2018-02-27 01:00:00 00:00
                                                                                                                                                      -9.207500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25393           µg/m³             83437           44.678977                  67.4                   o3                O₃ mass 2022-07-25 14:00:00 00:00                       3 2018-03-06 18:00:00 00:00
8209 Lisboa PT03100 government PT      False    False      reference grade 2022-08-27 18:00:00 00:00 2017-09-22 11:00:00 00:00 246228       38.748056 -9.202500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          23940           µg/m³             47279            6.259864                   8.8                 pm10                   PM10 2022-08-27 12:00:00 00:00                       1 2017-09-22 11:00:00 00:00
                                                                                                                                                      -9.202500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          24099           µg/m³             98890          298.816190                 197.0                   co                CO mass 2022-08-27 18:00:00 00:00                       4 2017-09-22 11:00:00 00:00
                                                                                                                                                      -9.202500  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25414           µg/m³            100059           32.176792                  29.6                  no2               NO₂ mass 2022-08-27 18:00:00 00:00                       5 2017-09-22 11:00:00 00:00
8211 Lisboa PT03084 government PT      False    False      reference grade 2022-08-27 18:00:00 00:00 2017-09-22 10:00:00 00:00 239745       38.754167 -9.230833  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25398           µg/m³             99881           57.904319                  94.0                   o3                O₃ mass 2022-08-27 18:00:00 00:00                       3 2017-09-22 11:00:00 00:00
                                                                                                                                                      -9.230833  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          23942           µg/m³             46577            3.805825                  10.0                 pm10                   PM10 2022-08-27 14:00:00 00:00                       1 2017-09-22 10:00:00 00:00
                                                                                                                                                      -9.230833  http://www.eea.europa.eu/themes/air/air-quality          EEA        eea          25397           µg/m³             93287           16.333777                  10.9                  no2               NO₂ mass 2022-08-27 18:00:00 00:00                       5 2017-09-22 11:00:00 00:00
  • Related