I am trying to dynamically flatten a json response for an API request but getting only one row with all the record back. kindly assist or point me in the right direction.
My json response looks like this
import requests, json URL='https://data.calgary.ca/resource/848s-4m4z.json' data = json.loads(requests.get(URL).text) data
[{'sector': 'NORTH',
'community_name': 'THORNCLIFFE',
'group_category': 'Crime',
'category': 'Theft FROM Vehicle',
'count': '9',
'resident_count': '8474',
'date': '2018-03-01T12:00:00.000',
'year': '2018',
'month': 'MAR',
'id': '2018-MAR-THORNCLIFFE-Theft FROM Vehicle-9',
'geocoded_column': {'latitude': '51.103099554741',
'longitude': '-114.068779421169',
'human_address': '{"address": "", "city": "", "state": "", "zip": ""}'},
':@computed_region_4a3i_ccfj': '2',
':@computed_region_p8tp_5dkv': '4',
':@computed_region_4b54_tmc4': '2',
':@computed_region_kxmf_bzkv': '192'},
{'sector': 'SOUTH',
'community_name': 'WOODBINE',
'group_category': 'Crime',
'category': 'Theft FROM Vehicle',
'count': '3',
'resident_count': '8866',
'date': '2019-11-01T00:00:00.000',
'year': '2019',
'month': 'NOV',
'id': '2019-NOV-WOODBINE-Theft FROM Vehicle-3',
'geocoded_column': {'latitude': '50.939610852207664',
'longitude': '-114.12962865374453',
'human_address': '{"address": "", "city": "", "state": "", "zip": ""}'},
':@computed_region_4a3i_ccfj': '1',
':@computed_region_p8tp_5dkv': '6',
':@computed_region_4b54_tmc4': '5',
':@computed_region_kxmf_bzkv': '43'}
]
Here is my code
``
`# Function for flattening
# json
def flatten_json(y):
out = {}
def flatten(x, name=''):
# If the Nested key-value
# pair is of dict type
if type(x) is dict:
for a in x:
flatten(x[a], name a '_')
# If the Nested key-value
# pair is of list type
elif type(x) is list:
i = 0
for a in x:
flatten(a, name str(i) '_')
i = 1
else:
out[name[:-1]] = x
flatten(y)
return out
# Driver code
# print(flatten_json(data))
newf=flatten_json(data)
pd.json_normalize(newf)`
``
It returns
[enter image description here](https://i.stack.imgur.com/i6mUe.png)
While am expecting the data in the following format
[enter image description here](https://i.stack.imgur.com/mXNtU.png).
json_normalize gives me the data in expected format but I need a way to dynamically parse different json request format (programmatically).
CodePudding user response:
To get your dataframe in correct form you can use this example (data
is your list from the question):
import requests
import pandas as pd
from ast import literal_eval
url = "https://data.calgary.ca/resource/848s-4m4z.json"
df = pd.DataFrame(requests.get(url).json())
df = pd.concat(
[
df,
df.pop("geocoded_column")
.apply(pd.Series)
.add_prefix("geocoded_column_"),
],
axis=1,
)
df["geocoded_column_human_address"] = df["geocoded_column_human_address"].apply(
literal_eval
)
df = pd.concat(
[
df,
df.pop("geocoded_column_human_address")
.apply(pd.Series)
.add_prefix("addr_"),
],
axis=1,
)
print(df.head().to_markdown(index=False))
Prints:
sector | community_name | group_category | category | count | resident_count | date | year | month | id | :@computed_region_4a3i_ccfj | :@computed_region_p8tp_5dkv | :@computed_region_4b54_tmc4 | :@computed_region_kxmf_bzkv | geocoded_column_latitude | geocoded_column_longitude | addr_address | addr_city | addr_state | addr_zip |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NORTH | THORNCLIFFE | Crime | Theft FROM Vehicle | 9 | 8474 | 2018-03-01T12:00:00.000 | 2018 | MAR | 2018-MAR-THORNCLIFFE-Theft FROM Vehicle-9 | 2 | 4 | 2 | 192 | 51.1031 | -114.069 | ||||
SOUTH | WOODBINE | Crime | Theft FROM Vehicle | 3 | 8866 | 2019-11-01T00:00:00.000 | 2019 | NOV | 2019-NOV-WOODBINE-Theft FROM Vehicle-3 | 1 | 6 | 5 | 43 | 50.9396 | -114.13 | ||||
SOUTH | WILLOW PARK | Crime | Theft FROM Vehicle | 4 | 5328 | 2019-11-01T00:00:00.000 | 2019 | NOV | 2019-NOV-WILLOW PARK-Theft FROM Vehicle-4 | 3 | 5 | 6 | 89 | 50.9566 | -114.056 | ||||
SOUTH | WILLOW PARK | Crime | Commercial Robbery | 1 | 5328 | 2019-11-01T00:00:00.000 | 2019 | NOV | 2019-NOV-WILLOW PARK-Commercial Robbery-1 | 3 | 5 | 6 | 89 | 50.9566 | -114.056 | ||||
WEST | LINCOLN PARK | Crime | Commercial Break & Enter | 5 | 2617 | 2019-11-01T00:00:00.000 | 2019 | NOV | 2019-NOV-LINCOLN PARK-Commercial Break & Enter-5 | 1 | 2 | 8 | 42 | 51.0101 | -114.13 |