Home > Back-end >  converting a deep nested loop from JSON into Pandas DF
converting a deep nested loop from JSON into Pandas DF

Time:10-19

I am getting info from an API, and getting this as the resulting JSON file:

{'business_discovery': {'media': {'data': [{'media_url': 'a link',
     'timestamp': '2022-01-01T01:00:00 0000',
     'caption': 'Caption',
     'media_type': 'type',
     'media_product_type': 'product_type',
     'comments_count': 1,
     'like_count': 1,
     'id': 'ID'},
  {'media_url': 'link',

# ... and so on

# NOTE: I scrubbed the numbers with dummy data

I know to get the data I can run this script to get all the data within the data

# "a" is the json without business discovery or media, which would be this:

a = {'data': [{'media_url': 'a link',
     'timestamp': '2022-01-01T01:00:00 0000',
     'caption': 'Caption',
     'media_type': 'type',
     'media_product_type': 'product_type',
     'comments_count': 1,
     'like_count': 1,
     'id': 'ID'},
  {'media_url': 'link',

# ... and so on

media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code = [],[],[],[],[],[],[],[]
for result in a['data']:
    media_url.append(result[u'media_url']) #Appending all the info within their Json to a list 
    timestamp.append(result[u'timestamp'])
    caption.append(result[u'caption'])
    media_type.append(result[u'media_type'])
    media_product_type.append(result[u'media_product_type'])
    comment_count.append(result[u'comments_count'])
    like_count.append(result[u'like_count'])
    id_code.append(result[u'id']) # All info exists, even when a value is 0
 
df = pd.DataFrame([media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code]).T

when I run the above command on the info from the api, I get errors saying that the data is not found

This works fine for now, but trying to figure out a way to "hop" over both business discovery, and media, to get straight to data so I can run this more effectively, rather than copying and pasting where I skip over business discovery and media

CodePudding user response:

Using json.normalize

df = pd.json_normalize(data=data["business_discovery"]["media"], record_path="data")
  • Related