I have this JSON
. It's an API request output. (Actually a list of 4 JSON
stored in a variable named responselist
). The goal is to parse it and store it into a dataframe, so my idea was to loop into this each response
of responselist
.
The JSON
looks like this :
[{'pagination': {'limit': 100, 'offset': 0, 'count': 100, 'total': 286},
'data': [{'flight_date': '2022-12-25',
'flight_status': 'scheduled',
'departure': {'airport': 'Marseille Provence Airport',
'timezone': 'Europe/Paris',
'iata': 'MRS',
'icao': 'LFML',
'terminal': '1B',
'gate': None,
'delay': None,
'scheduled': '2022-12-25T16:00:00 00:00',
'estimated': '2022-12-25T16:00:00 00:00',
'actual': None,
'estimated_runway': None,
'actual_runway': None},
'arrival': {'airport': 'Ste Catherine',
'timezone': 'Europe/Paris',
'iata': 'CLY',
'icao': 'LFKC',
'terminal': None,
'gate': None,
'baggage': None,
'delay': None,
'scheduled': '2022-12-25T17:00:00 00:00',
'estimated': '2022-12-25T17:00:00 00:00',
'actual': None,
'estimated_runway': None,
'actual_runway': None},
'airline': {'name': 'Air Corsica', 'iata': 'XK', 'icao': 'CCM'},
'flight': {'number': '351',
'iata': 'XK351',
'icao': 'CCM351',
'codeshared': None},
'aircraft': None,
'live': None},
{'flight_date': '2022-12-25',
'flight_status': 'scheduled',
'departure': {'airport': 'Marseille Provence Airport',
'timezone': 'Europe/Paris',
'iata': 'MRS',
'icao': 'LFML',
'terminal': '1B',
'gate': None,
'delay': 42,
'scheduled': '2022-12-25T10:50:00 00:00',
'estimated': '2022-12-25T10:50:00 00:00',
'actual': '2022-12-25T11:31:00 00:00',
'estimated_runway': '2022-12-25T11:31:00 00:00',
'actual_runway': '2022-12-25T11:31:00 00:00'},
'arrival': {'airport': 'Campo Dell Oro',
'timezone': 'Europe/Paris',
'iata': 'AJA',
'icao': 'LFKJ',
'terminal': '2',
'gate': None,
'baggage': None,
'delay': 23,
'scheduled': '2022-12-25T11:55:00 00:00',
'estimated': '2022-12-25T11:55:00 00:00',
'actual': None,
'estimated_runway': None,
'actual_runway': None},
'airline': {'name': 'Air Corsica', 'iata': 'XK', 'icao': 'CCM'},
'flight': {'number': '151',
'iata': 'XK151',
'icao': 'CCM151',
'codeshared': None},
'aircraft': None,
'live': None},
[...]
}]}
The bit of code below works to store the departure
nodes into a one-column dataframe :
for response in responselist:
data.extend(
flight_data.get('departure').get('airport')
for flight_data in response.get('data')
cols=['departures']
result = pd.DataFrame(data, columns=cols)
result
The idea was to parse other nodes of the JSON
to complete the dataframe. I think this should store the timezone
nodes into a second column but it returns invalid syntax
:
for response in responselist:
data.extend(
flight_data.get('departure').get('airport')
for flight_data in response.get('data')
data.extend(
flight_data.get('departure').get('timezone')
for flight_data in response.get('data')
cols=['departures', 'timezone']
result = pd.DataFrame(data, columns=cols)
result
CodePudding user response:
It is throwing invalid syntax because your indentations are not the same size, and your for loops aren't closed with :
's. This fixes these issues but is still not exactly what you're looking for.
for response in responselist:
data.extend(flight_data.get('departure').get('airport')
for flight_data in response.get('data'):
data.extend(flight_data.get('departure').get('timezone')
for flight_data in response.get('data'):
cols=['departures', 'timezone']
As @tomerar mentioned, it would be more efficient to leverage json_normalize()
to do the work for you, instead of using for loops:
df = pd.json_normalize(responselist[0]['data'])