Home > Software design >  Dynamically Flatten JSON response from API gives one Huge row
Dynamically Flatten JSON response from API gives one Huge row

Time:11-17

I am trying to dynamically flatten a json response for an API request but getting only one row with all the record back. kindly assist or point me in the right direction.

My json response looks like this

import requests, json URL='https://data.calgary.ca/resource/848s-4m4z.json' data = json.loads(requests.get(URL).text) data

[{'sector': 'NORTH',
  'community_name': 'THORNCLIFFE',
  'group_category': 'Crime',
  'category': 'Theft FROM Vehicle',
  'count': '9',
  'resident_count': '8474',
  'date': '2018-03-01T12:00:00.000',
  'year': '2018',
  'month': 'MAR',
  'id': '2018-MAR-THORNCLIFFE-Theft FROM Vehicle-9',
  'geocoded_column': {'latitude': '51.103099554741',
   'longitude': '-114.068779421169',
   'human_address': '{"address": "", "city": "", "state": "", "zip": ""}'},
  ':@computed_region_4a3i_ccfj': '2',
  ':@computed_region_p8tp_5dkv': '4',
  ':@computed_region_4b54_tmc4': '2',
  ':@computed_region_kxmf_bzkv': '192'},
 {'sector': 'SOUTH',
  'community_name': 'WOODBINE',
  'group_category': 'Crime',
  'category': 'Theft FROM Vehicle',
  'count': '3',
  'resident_count': '8866',
  'date': '2019-11-01T00:00:00.000',
  'year': '2019',
  'month': 'NOV',
  'id': '2019-NOV-WOODBINE-Theft FROM Vehicle-3',
  'geocoded_column': {'latitude': '50.939610852207664',
   'longitude': '-114.12962865374453',
   'human_address': '{"address": "", "city": "", "state": "", "zip": ""}'},
  ':@computed_region_4a3i_ccfj': '1',
  ':@computed_region_p8tp_5dkv': '6',
  ':@computed_region_4b54_tmc4': '5',
  ':@computed_region_kxmf_bzkv': '43'}
]


Here is my code


``
`# Function for flattening
# json
 
 
def flatten_json(y):
    out = {}
 
    def flatten(x, name=''):
 
        # If the Nested key-value
        # pair is of dict type
        if type(x) is dict:
 
            for a in x:
                flatten(x[a], name   a   '_')
 
        # If the Nested key-value
        # pair is of list type
        elif type(x) is list:
 
            i = 0
 
            for a in x:
                flatten(a, name   str(i)   '_')
                i  = 1
        else:
            out[name[:-1]] = x
 
    flatten(y)
    return out
 
 
# Driver code
# print(flatten_json(data))
newf=flatten_json(data)

pd.json_normalize(newf)`
``


It returns 

[enter image description here](https://i.stack.imgur.com/i6mUe.png)

While am expecting the data in the following format

[enter image description here](https://i.stack.imgur.com/mXNtU.png). 

 

json_normalize gives me the data in expected format but I need a way to dynamically parse different json request format (programmatically).

CodePudding user response:

To get your dataframe in correct form you can use this example (data is your list from the question):

import requests
import pandas as pd
from ast import literal_eval

url = "https://data.calgary.ca/resource/848s-4m4z.json"

df = pd.DataFrame(requests.get(url).json())
df = pd.concat(
    [
        df,
        df.pop("geocoded_column")
        .apply(pd.Series)
        .add_prefix("geocoded_column_"),
    ],
    axis=1,
)

df["geocoded_column_human_address"] = df["geocoded_column_human_address"].apply(
    literal_eval
)

df = pd.concat(
    [
        df,
        df.pop("geocoded_column_human_address")
        .apply(pd.Series)
        .add_prefix("addr_"),
    ],
    axis=1,
)

print(df.head().to_markdown(index=False))

Prints:

sector community_name group_category category count resident_count date year month id :@computed_region_4a3i_ccfj :@computed_region_p8tp_5dkv :@computed_region_4b54_tmc4 :@computed_region_kxmf_bzkv geocoded_column_latitude geocoded_column_longitude addr_address addr_city addr_state addr_zip
NORTH THORNCLIFFE Crime Theft FROM Vehicle 9 8474 2018-03-01T12:00:00.000 2018 MAR 2018-MAR-THORNCLIFFE-Theft FROM Vehicle-9 2 4 2 192 51.1031 -114.069
SOUTH WOODBINE Crime Theft FROM Vehicle 3 8866 2019-11-01T00:00:00.000 2019 NOV 2019-NOV-WOODBINE-Theft FROM Vehicle-3 1 6 5 43 50.9396 -114.13
SOUTH WILLOW PARK Crime Theft FROM Vehicle 4 5328 2019-11-01T00:00:00.000 2019 NOV 2019-NOV-WILLOW PARK-Theft FROM Vehicle-4 3 5 6 89 50.9566 -114.056
SOUTH WILLOW PARK Crime Commercial Robbery 1 5328 2019-11-01T00:00:00.000 2019 NOV 2019-NOV-WILLOW PARK-Commercial Robbery-1 3 5 6 89 50.9566 -114.056
WEST LINCOLN PARK Crime Commercial Break & Enter 5 2617 2019-11-01T00:00:00.000 2019 NOV 2019-NOV-LINCOLN PARK-Commercial Break & Enter-5 1 2 8 42 51.0101 -114.13
  • Related