Home > other >  Parse multiple JSON stored in a list and build a dataframa
Parse multiple JSON stored in a list and build a dataframa

Time:12-26

I have this JSON. It's an API request output. (Actually a list of 4 JSON stored in a variable named responselist). The goal is to parse it and store it into a dataframe, so my idea was to loop into this each response of responselist.

The JSON looks like this :

[{'pagination': {'limit': 100, 'offset': 0, 'count': 100, 'total': 286},
  'data': [{'flight_date': '2022-12-25',
    'flight_status': 'scheduled',
    'departure': {'airport': 'Marseille Provence Airport',
     'timezone': 'Europe/Paris',
     'iata': 'MRS',
     'icao': 'LFML',
     'terminal': '1B',
     'gate': None,
     'delay': None,
     'scheduled': '2022-12-25T16:00:00 00:00',
     'estimated': '2022-12-25T16:00:00 00:00',
     'actual': None,
     'estimated_runway': None,
     'actual_runway': None},
    'arrival': {'airport': 'Ste Catherine',
     'timezone': 'Europe/Paris',
     'iata': 'CLY',
     'icao': 'LFKC',
     'terminal': None,
     'gate': None,
     'baggage': None,
     'delay': None,
     'scheduled': '2022-12-25T17:00:00 00:00',
     'estimated': '2022-12-25T17:00:00 00:00',
     'actual': None,
     'estimated_runway': None,
     'actual_runway': None},
    'airline': {'name': 'Air Corsica', 'iata': 'XK', 'icao': 'CCM'},
    'flight': {'number': '351',
     'iata': 'XK351',
     'icao': 'CCM351',
     'codeshared': None},
    'aircraft': None,
    'live': None},
   {'flight_date': '2022-12-25',
    'flight_status': 'scheduled',
    'departure': {'airport': 'Marseille Provence Airport',
     'timezone': 'Europe/Paris',
     'iata': 'MRS',
     'icao': 'LFML',
     'terminal': '1B',
     'gate': None,
     'delay': 42,
     'scheduled': '2022-12-25T10:50:00 00:00',
     'estimated': '2022-12-25T10:50:00 00:00',
     'actual': '2022-12-25T11:31:00 00:00',
     'estimated_runway': '2022-12-25T11:31:00 00:00',
     'actual_runway': '2022-12-25T11:31:00 00:00'},
    'arrival': {'airport': 'Campo Dell Oro',
     'timezone': 'Europe/Paris',
     'iata': 'AJA',
     'icao': 'LFKJ',
     'terminal': '2',
     'gate': None,
     'baggage': None,
     'delay': 23,
     'scheduled': '2022-12-25T11:55:00 00:00',
     'estimated': '2022-12-25T11:55:00 00:00',
     'actual': None,
     'estimated_runway': None,
     'actual_runway': None},
    'airline': {'name': 'Air Corsica', 'iata': 'XK', 'icao': 'CCM'},
    'flight': {'number': '151',
     'iata': 'XK151',
     'icao': 'CCM151',
     'codeshared': None},
    'aircraft': None,
    'live': None},
[...]
}]}

The bit of code below works to store the departure nodes into a one-column dataframe :

for response in responselist:  
  data.extend(
      flight_data.get('departure').get('airport')
      for flight_data in response.get('data')

  cols=['departures']

result = pd.DataFrame(data, columns=cols)
result

The idea was to parse other nodes of the JSON to complete the dataframe. I think this should store the timezone nodes into a second column but it returns invalid syntax:

for response in responselist:  
  data.extend(
      flight_data.get('departure').get('airport')
      for flight_data in response.get('data')
  data.extend(
     flight_data.get('departure').get('timezone')
     for flight_data in response.get('data')
  
  cols=['departures', 'timezone']

result = pd.DataFrame(data, columns=cols)
result

CodePudding user response:

It is throwing invalid syntax because your indentations are not the same size, and your for loops aren't closed with :'s. This fixes these issues but is still not exactly what you're looking for.

for response in responselist:  
  data.extend(flight_data.get('departure').get('airport')

  for flight_data in response.get('data'):
    data.extend(flight_data.get('departure').get('timezone')

  for flight_data in response.get('data'):
    cols=['departures', 'timezone']

As @tomerar mentioned, it would be more efficient to leverage json_normalize() to do the work for you, instead of using for loops:

df = pd.json_normalize(responselist[0]['data'])
  • Related