Home > Mobile >  How to convert a list of nested dictionaries (includes tuples) as a dataframe
How to convert a list of nested dictionaries (includes tuples) as a dataframe

Time:08-22

I have a piece of code which generates a list of nested dictionaries like below:

[{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
   {'Name': 'A', 'ID': 2, 'num': 68}),
  'final_value': 118},
 {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
   {'Name': 'A', 'ID': 4, 'num': 67}),
  'final_value': 117},
 {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
   {'Name': 'A', 'ID': 6, 'num': 67}),
  'final_value': 117}]

I want to convert the dictionary into a dataframe like below

enter image description here

How can I do it using Python?

I have tried the below piece of code

merge_values =   [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
       {'Name': 'A', 'ID': 2, 'num': 68}),
      'final_value': 118},
     {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
       {'Name': 'A', 'ID': 4, 'num': 67}),
      'final_value': 117},
     {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
       {'Name': 'A', 'ID': 6, 'num': 67}),
      'final_value': 117}]
test = pd.DataFrame()
i = 0
for match in merge_values:
    for d in match:
        final_cfr = d['final_value']
        comb = d['cb']
        i = i 1
        z = pd.DataFrame()
        for t in comb:
            dct = {k:[v] for k,v in t.items()}
            x = pd.DataFrame(dct)
            x['merge_id'] = i
            x['Final_Value'] = final_value
            test = pd.concat([test, x])  

The problem with this piece of code is it adds the rows one below another. I need the elements of the tuple next to each other.

CodePudding user response:

You will need to clean your data by creating a new dict with the structure that you want, like this:

import pandas as pd


dirty_data = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
   {'Name': 'A', 'ID': 2, 'num': 68}),
  'final_value': 118},
 {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
   {'Name': 'A', 'ID': 4, 'num': 67}),
  'final_value': 117},
 {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
   {'Name': 'A', 'ID': 6, 'num': 67}),
  'final_value': 117}]

  def clean_data(dirty_data: dict) -> dict:
    names = []
    ids = []
    nums = []
    m_ids = []
    m_nums = []
    finals = []
    
    for cb in dirty_data:
        names.append(cb["cb"][0]["Name"])
        ids.append(cb["cb"][0]["ID"])
        nums.append(cb["cb"][0]["num"])
        m_ids.append(cb["cb"][1]["ID"])
        m_nums.append(cb["cb"][1]["num"])
        finals.append(cb["final_value"])
    
    return {"Name": names, "ID": ids, "num": nums, "M_ID": m_ids, "M_num": m_nums, "Final": finals}

df = pd.DataFrame(clean_data(dirty_data))
df

CodePudding user response:

You could try to read the data into a dataframe as is and then restructure it until you get the desired result, but in this case, it doesn't seem practical.

Instead, I'd flatten the input into a list of lists to pass to pd.DataFrame. Here is a relatively concise way to do that with your sample data:

from operator import itemgetter
import pandas as pd

data = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
                {'Name': 'A', 'ID': 2, 'num': 68}),
         'final_value': 118},
        {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
                {'Name': 'A', 'ID': 4, 'num': 67}),
         'final_value': 117},
        {'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
                {'Name': 'A', 'ID': 6, 'num': 67}),
         'final_value': 117}]

keys = ['Name', 'ID', 'num', 'M_Name', 'M_ID', 'M_num', 'final_value']

# generates ['A', 1, 50, 'A', 2, 68, 118] etc.
flattened = ([value for item in row['cb'] 
              for value in itemgetter(*keys[:3])(item)] 
               [row['final_value']] 
             for row in data)

df = pd.DataFrame(flattened)
df.columns = keys

# get rid of superfluous M_Name column
df.drop('M_Name', axis=1, inplace=True)

itemgetter(*keys[:3])(item) is the same as [item[k] for k in keys[:3]]. On flattening lists of lists with list (or generator) comprehensions, see How do I make a flat list out of a list of lists?.

Result:

  Name  ID  num  M_ID  M_num  final_value
0    A   1   50     2     68          118
1    A   1   50     4     67          117
2    A   1   50     6     67          117
  • Related