I recently did a sentiment analysis using Oracle's AI Language API in Python. I had the API iterate over 1300 Tweets and stored the output from the API in a list, where each element in the list corresponded with a single Tweet ID. I then created a dictionary, where the key was the Tweet ID and the value was the output from the API for that Tweet ID. I now have a massive dictionary with dictionaries nested within dictionaries and am not sure how to convert this to a dataframe in Pandas.
Here are the first few entries of the dictionary I am working with.
{1292750633104289792: {
"aspects": []
},
1275918779831238656: {
"aspects": []
},
1293251961031204865: {
"aspects": [
{
"length": 8,
"offset": 51,
"scores": {
"Negative": 0.18023298680782318,
"Neutral": 0.0,
"Positive": 0.8197670578956604
},
"sentiment": "Positive",
"text": "building"
}
]
},
1293312774563606531: {
"aspects": []
},
1293375754751881217: {
"aspects": [
{
"length": 4,
"offset": 5,
"scores": {
"Negative": 0.9987309575080872,
"Neutral": 0.0012690634466707706,
"Positive": 0.0
},
"sentiment": "Negative",
"text": "poll"
}
]
}}
Thanks so much in advance.
CodePudding user response:
You can flatten your structure using a nested comprehension, and then pass the result to pd.DataFrame
:
import pandas as pd
r = [{'tweet_id':a,
'length':i['length'],
'offset':i['offset'],
**{f'score_{j}':k for j, k in i['scores'].items()},
'sentiment':i['sentiment'],
'text':i['text'],
}
for a, b in data.items() for i in (b['aspects'] if isinstance(b, dict) else b.aspects)]
df = pd.DataFrame(r)
Output:
tweet_id length offset score_Negative score_Neutral score_Positive sentiment text
0 1293251961031204865 8 51 0.180233 0.000000 0.819767 Positive building
1 1293375754751881217 4 5 0.998731 0.001269 0.000000 Negative poll