Home > Software engineering >  Converting Nested Dictionary List to a DataFrame Python
Converting Nested Dictionary List to a DataFrame Python

Time:06-12

I am working on an NLP model that is able to parse questions and subsequent answers from news articles. However, in order to work within minimum RAM constraints, I’ve hade to breakdown each sentence in the article and parse QnA individually and append them. Based on how many articles there were, paragraphs, sentences, after processing it spits out an increasingly complex mess.


[[{'answer': 'Meta Platforms ',
   'question': 'What is the name of the company that became two of the most talked-about social media companies in recent months?'},
  {'answer': ' Twitter ',
   'question': 'What was the name of the two most talked-about social media companies in recent months?'},
  {'answer': 'Elon Musk ', 'question': 'Who took a 9.2% stake in Twitter?'}],
 [{'answer': '$54.20 ',
   'question': 'How much did Musk bid to acquire Twitter?'},
  {'answer': ' $44 billion ',
   'question': 'How much did Musk bid to acquire Twitter?'}],
 [{'answer': 'a "poison pill" defense ',
   'question': "What did Twitter initially adopt against Musk's offer?"},
  {'answer': 'failing to provide adequate information about its spam and bot accounts ',
   'question': 'Why did Musk try to back out of the deal?'}],
 [{'answer': 'Twitter ',
   'question': 'Which company has a stock price below Musk\'s "best and final" offer?'},
  {'answer': ' 26% ',
   'question': 'What is Twitter\'s stock price below Musk\'s "best and final" offer?'},
  {'answer': 'Getty Images ',
   'question': "What is the name of the image source that Twitter's stock price remains 26% below Musk's offer?"},
  {'answer': 'February ', 'question': "When did Meta's downfall start?"}],
 [{'answer': 'ByteDance ', 'question': 'Who was TikTok?'},
  {'answer': ' ⁇!--//--> ⁇! ',
   'question': "What was the name of the company's first-quarter report in April?"},
  {'answer': ' ⁇!-- googletag.cmd.push ',
   'question': "What was the name of the company's first-quarter report?"},
  {'answer': 'Sheryl Sandberg ',
   'question': "Who was Meta's chief operating officer?"}],
 [{'answer': 'deteriorating macro environment ',
   'question': 'What caused Snap to reduce its second-quarter guidance in late May?'},
  {'answer': ' deteriorating macro environment ',
   'question': 'What caused Snap to reduce its second-quarter guidance in late May?'},
  {'answer': 'Twitter and Meta ',
   'question': 'Which two companies have been terrible investments over the past 12 months?'}],
 [{'answer': 'Twitter ',
   'question': "What company's stock has tumbled more than 30% during that period?"},
  {'answer': ' Meta ', 'question': "Who's stock has plummeted over 40%?"},
  {'answer': ' 30% ',
   'question': "How much has Twitter's stock tumbled during that period?"}],

This is a partial example. When simply doing:

pd.DataFrame(data)

It results in a 13578 rows × 32 columns DataFrame - so there is a lot of nesting - the depth of which is random and base on the articles provided. I tried modifying flatten-dict and deep flatten to try and get the shape more familiar but both options did not get me any closer.

What I need to happen is to be able to turn the output into 2 columns of question and answer outputs. I’ve tried specifying the columns when flattening, but it always results in errors. Any tips on how to go about this for a universal unknown depth?

CodePudding user response:

Given your example data:

df = pd.DataFrame(sub for el in your_list_of_lists for sub in el)

Which'll give you:

|    | answer                                                                  | question                                                                                                          |
|---:|:------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------|
|  0 | Meta Platforms                                                          | What is the name of the company that became two of the most talked-about social media companies in recent months? |
|  1 | Twitter                                                                 | What was the name of the two most talked-about social media companies in recent months?                           |
|  2 | Elon Musk                                                               | Who took a 9.2% stake in Twitter?                                                                                 |
|  3 | $54.20                                                                  | How much did Musk bid to acquire Twitter?                                                                         |
|  4 | $44 billion                                                             | How much did Musk bid to acquire Twitter?                                                                         |
|  5 | a "poison pill" defense                                                 | What did Twitter initially adopt against Musk's offer?                                                            |
|  6 | failing to provide adequate information about its spam and bot accounts | Why did Musk try to back out of the deal?                                                                         |
|  7 | Twitter                                                                 | Which company has a stock price below Musk's "best and final" offer?                                              |
|  8 | 26%                                                                     | What is Twitter's stock price below Musk's "best and final" offer?                                                |
|  9 | Getty Images                                                            | What is the name of the image source that Twitter's stock price remains 26% below Musk's offer?                   |
| 10 | February                                                                | When did Meta's downfall start?                                                                                   |
| 11 | ByteDance                                                               | Who was TikTok?                                                                                                   |
| 12 | ⁇!--//--> ⁇!                                                            | What was the name of the company's first-quarter report in April?                                                 |
| 13 | ⁇!-- googletag.cmd.push                                                 | What was the name of the company's first-quarter report?                                                          |
| 14 | Sheryl Sandberg                                                         | Who was Meta's chief operating officer?                                                                           |
| 15 | deteriorating macro environment                                         | What caused Snap to reduce its second-quarter guidance in late May?                                               |
| 16 | deteriorating macro environment                                         | What caused Snap to reduce its second-quarter guidance in late May?                                               |
| 17 | Twitter and Meta                                                        | Which two companies have been terrible investments over the past 12 months?                                       |
| 18 | Twitter                                                                 | What company's stock has tumbled more than 30% during that period?                                                |
| 19 | Meta                                                                    | Who's stock has plummeted over 40%?                                                                               |
| 20 | 30%                                                                     | How much has Twitter's stock tumbled during that period? 

CodePudding user response:

You can build a dataframe from each sublist of your main list and concatenate them all (input_arr being your list of lists):

df = pd.concat([pd.DataFrame(x) for x in input_arr]).reset_index(drop=True)
df = df[['question', 'answer']]
print(df)

Output:

                                             question                                             answer
0   What is the name of the company that became tw...                                    Meta Platforms 
1   What was the name of the two most talked-about...                                           Twitter 
2                   Who took a 9.2% stake in Twitter?                                         Elon Musk 
3           How much did Musk bid to acquire Twitter?                                            $54.20 
4           How much did Musk bid to acquire Twitter?                                       $44 billion 
5   What did Twitter initially adopt against Musk'...                           a "poison pill" defense 
6           Why did Musk try to back out of the deal?  failing to provide adequate information about ...
7   Which company has a stock price below Musk's "...                                           Twitter 
8   What is Twitter's stock price below Musk's "be...                                               26% 
9   What is the name of the image source that Twit...                                      Getty Images 
10                    When did Meta's downfall start?                                          February 
11                                    Who was TikTok?                                         ByteDance 
12  What was the name of the company's first-quart...                                      ⁇!--//--> ⁇! 
13  What was the name of the company's first-quart...                           ⁇!-- googletag.cmd.push 
14            Who was Meta's chief operating officer?                                   Sheryl Sandberg 
15  What caused Snap to reduce its second-quarter ...                   deteriorating macro environment 
16  What caused Snap to reduce its second-quarter ...                   deteriorating macro environment 
17  Which two companies have been terrible investm...                                  Twitter and Meta 
18  What company's stock has tumbled more than 30%...                                           Twitter 
19                Who's stock has plummeted over 40%?                                              Meta 
20  How much has Twitter's stock tumbled during th...                                               30% 
  • Related