I am getting info from an API, and getting this as the resulting JSON file:
{'business_discovery': {'media': {'data': [{'media_url': 'a link',
'timestamp': '2022-01-01T01:00:00 0000',
'caption': 'Caption',
'media_type': 'type',
'media_product_type': 'product_type',
'comments_count': 1,
'like_count': 1,
'id': 'ID'},
{'media_url': 'link',
# ... and so on
# NOTE: I scrubbed the numbers with dummy data
I know to get the data
I can run this script to get all the data within the data
# "a" is the json without business discovery or media, which would be this:
a = {'data': [{'media_url': 'a link',
'timestamp': '2022-01-01T01:00:00 0000',
'caption': 'Caption',
'media_type': 'type',
'media_product_type': 'product_type',
'comments_count': 1,
'like_count': 1,
'id': 'ID'},
{'media_url': 'link',
# ... and so on
media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code = [],[],[],[],[],[],[],[]
for result in a['data']:
media_url.append(result[u'media_url']) #Appending all the info within their Json to a list
timestamp.append(result[u'timestamp'])
caption.append(result[u'caption'])
media_type.append(result[u'media_type'])
media_product_type.append(result[u'media_product_type'])
comment_count.append(result[u'comments_count'])
like_count.append(result[u'like_count'])
id_code.append(result[u'id']) # All info exists, even when a value is 0
df = pd.DataFrame([media_url,timestamp,caption,media_type,media_product_type,comment_count,like_count,id_code]).T
when I run the above command on the info from the api, I get errors saying that the data
is not found
This works fine for now, but trying to figure out a way to "hop" over both business discovery, and media, to get straight to data
so I can run this more effectively, rather than copying and pasting where I skip over business discovery and media
CodePudding user response:
Using json.normalize
df = pd.json_normalize(data=data["business_discovery"]["media"], record_path="data")