Home > Software design >  How do I convert a dictionary that has nested dictionaries within it into a dataframe in Python?
How do I convert a dictionary that has nested dictionaries within it into a dataframe in Python?

Time:12-10

I recently did a sentiment analysis using Oracle's AI Language API in Python. I had the API iterate over 1300 Tweets and stored the output from the API in a list, where each element in the list corresponded with a single Tweet ID. I then created a dictionary, where the key was the Tweet ID and the value was the output from the API for that Tweet ID. I now have a massive dictionary with dictionaries nested within dictionaries and am not sure how to convert this to a dataframe in Pandas.

Here are the first few entries of the dictionary I am working with.

 {1292750633104289792: {
   "aspects": []
 },
 1275918779831238656: {
   "aspects": []
 },
 1293251961031204865: {
   "aspects": [
     {
       "length": 8,
       "offset": 51,
       "scores": {
         "Negative": 0.18023298680782318,
         "Neutral": 0.0,
         "Positive": 0.8197670578956604
       },
       "sentiment": "Positive",
       "text": "building"
     }
   ]
 },
 1293312774563606531: {
   "aspects": []
 },
 1293375754751881217: {
   "aspects": [
     {
       "length": 4,
       "offset": 5,
       "scores": {
         "Negative": 0.9987309575080872,
         "Neutral": 0.0012690634466707706,
         "Positive": 0.0
       },
       "sentiment": "Negative",
       "text": "poll"
     }
   ]
 }}

Thanks so much in advance.

CodePudding user response:

You can flatten your structure using a nested comprehension, and then pass the result to pd.DataFrame:

import pandas as pd
r = [{'tweet_id':a, 
       'length':i['length'],
        'offset':i['offset'],
        **{f'score_{j}':k for j, k in i['scores'].items()},
        'sentiment':i['sentiment'],
        'text':i['text'],
     } 
     for a, b in data.items() for i in (b['aspects'] if isinstance(b, dict) else b.aspects)]

df = pd.DataFrame(r)

Output:

              tweet_id  length  offset  score_Negative  score_Neutral  score_Positive sentiment      text
0  1293251961031204865       8      51        0.180233       0.000000        0.819767  Positive  building
1  1293375754751881217       4       5        0.998731       0.001269        0.000000  Negative      poll
  • Related