When I try to expose nested attributes in a json using json_normalise, I can't bring back the attributes of one of the nested objects. I'm trying to return the 'type' as a column but the following error is returned: list indices must be integers or slices, not str
The Json structure looks like this:
[{
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail, Tfl.Api.Presentation.Entities",
"id": 269639,
"lat": 51.5169,
"lon": -0.1428,
"location": "Regent Street junction with Little Portland Street",
"date": "2016-06-04T20:00:00Z",
"severity": "Slight",
"borough": "City of Westminster",
"casualties": [{
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.Casualty, Tfl.Api.Presentation.Entities",
"age": 23,
"class": "Driver",
"severity": "Slight",
"mode": "PedalCycle",
"ageBand": "Adult"
}],
"vehicles": [{
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
"type": "PedalCycle"
}, {
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
"type": "Car"
}]
the code:
import json
import pandas as pd
import numpy as np
import datetime
# load data using Python JSON module
with open('/Users/samappleton/Documents/Python/Project/api_response_2023-01-31-13:22.json','r') as f:
data = json.loads(f.read())
# Normalizing data
try:
df = pd.json_normalize(data, record_path =['casualties'],
meta =[
'id',
'lat',
'lon',
'location',
'date',
'severity',
'borough',
['vehicles']['type']
],record_prefix ='casualties.')
#
print(df)
# df.to_csv("Documents/Python/Project/CSV_Output.csv")
except Exception as e:
print(e)
Any ideas on how I can extract the vehicles.type column?
Thank you
CodePudding user response:
it is not clear what is your precise goal. However, I try to show you a solution. If you want to flatten both casualties and vehicle maybe you should normalize the json once with casualties as record path, and another time with vehicles
Define Data
import json
import pandas as pd
string = '''{
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail, Tfl.Api.Presentation.Entities",
"id": 269639,
"lat": 51.5169,
"lon": -0.1428,
"location": "Regent Street junction with Little Portland Street",
"date": "2016-06-04T20:00:00Z",
"severity": "Slight",
"borough": "City of Westminster",
"casualties": [{
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.Casualty, Tfl.Api.Presentation.Entities",
"age": 23,
"class": "Driver",
"severity": "Slight",
"mode": "PedalCycle",
"ageBand": "Adult"
}],
"vehicles": [{
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
"type": "PedalCycle"
}, {
"$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
"type": "Car"
}]}'''
data = json.loads(string)
Normalize Json
df = pd.json_normalize(data, record_path = ['vehicles'],
meta=['id', 'lat','lon', 'location', 'date', 'severity',
'borough', 'casualties'], record_prefix='vehicle.') # I do not include $type key
df2 = pd.json_normalize(data, record_path = ['casualties'], record_prefix = 'casualties.',
meta='id')
# merge two different nested listed of dictionaries flattened in dataframes
res = df.merge(df2)
print(res.columns)
print(res) # shape (2,15)
Result
Index(['vehicle.$type', 'vehicle.type', 'id', 'lat', 'lon', 'location',
'date','severity', 'borough', 'casualties.$type', 'casualties.age',
'casualties.class', 'casualties.severity', 'casualties.mode',
'casualties.ageBand'],
dtype='object')