Home > Software engineering >  How to select certain values from JSON to build Pandas dataframe?
How to select certain values from JSON to build Pandas dataframe?

Time:10-21

I use the requests module to hit an endpoint and get back a JSON response:

response = requests.get(endpoint, auth=(user_name, api_key)).json()

The response looks like this:

{'SalesInfoStatus': {'next_offset': 15,
  'sales': [{'saSalesID': '761S-1666031549-adhoc-0:0',
    'identifier': '761',
    'origin': 'New York',
    'destination': 'London',
    'filed_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/19/2022',
     'localtime': 1666018949},
     'actual_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/19/2022',
     'localtime': 1666018949},
    'inbound_saSalesID': '761S-1666029229-adhoc-0:0'},
   {'saSalesID': '762S-1666031549-adhoc-0:0',
    'identifier': '762',
    'origin': 'New York',
    'destination': 'London',
    'filed_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/17/2022',
     'localtime': 1666018949},
     'actual_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/17/2022',
     'localtime': 1666018949},
    'inbound_saSalesID': '762S-1666029229-adhoc-0:0'},

   {'saSalesID': '765S-1666031549-adhoc-0:0',
    'identifier': '765',
    'origin': 'Paris',
    'destination': 'Tokyo',
    'filed_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/15/2022',
     'localtime': 1666018949},
     'actual_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/15/2022',
     'localtime': 1666018949},
    'inbound_saSalesID': '765S-1666029229-adhoc-0:0'},

   {'saSalesID': '767S-1666031549-adhoc-0:0',
    'identifier': '767',
    'origin': 'Los Angeles',
    'destination': 'Sydney',
    'filed_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/13/2022',
     'localtime': 1666018949},
     'actual_time': {'epoch': 1666033349,
     'time': '15:02',
     'date': '10/13/2022',
     'localtime': 1666018949},
    'inbound_saSalesID': '767S-1666029229-adhoc-0:0'}

]}}

When I call type(response), I get dict.

I would like to build a Pandas dataframe that contains the date, origin, destination, and identifier fields from this JSON.

The resulting dataframe would look like this:

date         origin          destination     identifier
10/19/22     New York        London          761
10/17/22     New York        London          762
10/15/22     Paris           Tokyo           765
10/13/22     Los Angeles     Sydney          767

I'm using the following:

df = pd.json_normalize(response)

But, this returns a dataframe with a column containing what appears to be nested JSON:

                   SalesInfoStatus
    sales          [{'saSalesID': '761S-1666031549-adhoc-0:0',... 
    next_offset    15                                  

How would I split the date, origin, destination and identifier fields out into their own columns in this dataframe?

Thanks!

CodePudding user response:

You can use pd.json_normalize on the list value of sales key

df = pd.json_normalize(response['SalesInfoStatus']['sales'])[['filed_time.date', 'origin', 'destination', 'identifier']]
print(df)

  filed_time.date       origin destination identifier
0      10/19/2022     New York      London        761
1      10/17/2022     New York      London        762
2      10/15/2022        Paris       Tokyo        765
3      10/13/2022  Los Angeles      Sydney        767
  • Related