List indicies must be integers or slices, not str-CodePudding

When I try to expose nested attributes in a json using json_normalise, I can't bring back the attributes of one of the nested objects. I'm trying to return the 'type' as a column but the following error is returned: list indices must be integers or slices, not str

The Json structure looks like this:

[{
    "$type": "Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail, Tfl.Api.Presentation.Entities",
    "id": 269639,
    "lat": 51.5169,
    "lon": -0.1428,
    "location": "Regent Street junction with Little Portland Street",
    "date": "2016-06-04T20:00:00Z",
    "severity": "Slight",
    "borough": "City of Westminster",
    "casualties": [{
        "$type": "Tfl.Api.Presentation.Entities.AccidentStats.Casualty, Tfl.Api.Presentation.Entities",
        "age": 23,
        "class": "Driver",
        "severity": "Slight",
        "mode": "PedalCycle",
        "ageBand": "Adult"
    }],
    "vehicles": [{
        "$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
        "type": "PedalCycle"
    }, {
        "$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
        "type": "Car"
    }]

the code:

import json
import pandas as pd
import numpy as np
import datetime


# load data using Python JSON module
with open('/Users/samappleton/Documents/Python/Project/api_response_2023-01-31-13:22.json','r') as f:
    data = json.loads(f.read())
 
# Normalizing data
try:

    df = pd.json_normalize(data, record_path =['casualties'],
    meta =[
        'id',
        'lat',
        'lon',
        'location',
        'date',
        'severity',
        'borough',
        ['vehicles']['type']

    ],record_prefix ='casualties.')

    #
     print(df)
    # df.to_csv("Documents/Python/Project/CSV_Output.csv") 

except Exception as e:
    print(e)

Any ideas on how I can extract the vehicles.type column?

Thank you

CodePudding user response：

it is not clear what is your precise goal. However, I try to show you a solution. If you want to flatten both casualties and vehicle maybe you should normalize the json once with casualties as record path, and another time with vehicles

Define Data

import json
import pandas as pd

string = '''{
        "$type": "Tfl.Api.Presentation.Entities.AccidentStats.AccidentDetail, Tfl.Api.Presentation.Entities",
        "id": 269639,
        "lat": 51.5169,
        "lon": -0.1428,
        "location": "Regent Street junction with Little Portland Street",
        "date": "2016-06-04T20:00:00Z",
        "severity": "Slight",
        "borough": "City of Westminster",
        "casualties": [{
            "$type": "Tfl.Api.Presentation.Entities.AccidentStats.Casualty, Tfl.Api.Presentation.Entities",
            "age": 23,
            "class": "Driver",
            "severity": "Slight",
            "mode": "PedalCycle",
            "ageBand": "Adult"
        }],
        "vehicles": [{
            "$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
            "type": "PedalCycle"
        }, {
            "$type": "Tfl.Api.Presentation.Entities.AccidentStats.Vehicle, Tfl.Api.Presentation.Entities",
            "type": "Car"
        }]}'''


data = json.loads(string)

Normalize Json

df = pd.json_normalize(data, record_path = ['vehicles'],
   meta=['id', 'lat','lon', 'location', 'date', 'severity', 
   'borough', 'casualties'], record_prefix='vehicle.') # I do not include $type key

df2 = pd.json_normalize(data, record_path = ['casualties'], record_prefix = 'casualties.', 
   meta='id')

# merge two different nested listed of dictionaries flattened in dataframes
res = df.merge(df2)

print(res.columns) 
print(res) # shape (2,15)

Result

Index(['vehicle.$type', 'vehicle.type', 'id', 'lat', 'lon', 'location', 
       'date','severity', 'borough', 'casualties.$type', 'casualties.age',
       'casualties.class', 'casualties.severity', 'casualties.mode',
       'casualties.ageBand'],
      dtype='object')