Home > Software engineering >  Converting a list of lists of dictionaries to a Pandas DataFrame
Converting a list of lists of dictionaries to a Pandas DataFrame

Time:10-08

What might be the best way of going about working with the data structure of a list of list of dictionaries like the one I'm working with:

       [[{'name': 'Export A Smooth'},
       {'filter': 'unfiltered'},
       {'number of cigarette': 25},
       {'nicotine content': 10.5},
       {'tar content': 15.0},
       {'menthol': False},
       {'king size': False},
       {'price': 18.99},
       {'units sold per week': 50},
       {'profits per week': 949.50}],

      [{'name': 'Export A Medium'},
       {'filter': 'white'},
       {'number of cigarette': 25},
       {'nicotine content': 10.0},
       {'tar content': 12.0},
       {'menthol': False},
       {'king size': False},
       {'price': 18.99},
       {'units sold per week': 39},
       {'profits per week': 740.61}],

      [{'name': 'Canadian Classics Select'},
       {'filter': 'brown'},
       {'number of cigarette': 25},
       {'nicotine content': 11.1},
       {'tar content': 11.0},
       {'menthol': True},
       {'king size': True},
       {'price': 19.09},
       {'units sold per week': 38},
       {'profits per week': 725.42}]]

and turn it into a structured table format:

name Filter Number of Cigarettes
Export A Smooth unfiltered 25
Export A Medium white 25
Canadian Classics Select brown 20

I've tried a few different methods for getting the right table format and the table format is correct but there is a lot of NaN values that pop up for all the cigarettes except the first one (export smooth).

unit name filter profits per week
1 Export A Smooth NaN ... 900 NaN
2 NaN unfiltered ... NaN
3 NaN NaN ... NaN
4 NaN NaN ... NaN
5 NaN NaN ... NaN
.. ... ... ... ...
155 NaN NaN ... NaN
156 NaN NaN ... NaN
157 NaN NaN ... NaN
158 NaN NaN ... NaN
159 NaN NaN ... 447.72

I've tried pd.DataFrame(cig_list).stack().apply(pd.Series) and pd.concat([pd.DataFrame(ii) for ii in cigarettes]) as well as looping through the cigs and trying to pass them into the DataFrame in that way.

   cig_list_items = []
   for items in cig_list:
   for _ in items:
   cig_list_items.append(_)
   pd.DataFrame(cig_list_items)

They all return the same result so I figure there must be some issue with the way the dictionaries are formatted? My suspicion is that the dictionaries need to be rearranged so that they read more like this:

[[{'name': 'Export A Smooth'},
  {'name': 'Export A Medium'}
  {'name': 'Pall Mall Bold'}],


  [{'filter': 'unfiltered'},
  {'filter': 'white'}
  {'filter': 'regular'}]]

CodePudding user response:

Since every entry is a individual dict, you can join them using list dict comprehension:

df = pd.DataFrame([{k: v for d in i for k, v in d.items()} for i in l])

print (df)

                       name      filter  number of cigarette  nicotine content  tar content  menthol  king size  price  units sold per week  profits per week
0           Export A Smooth  unfiltered                   25              10.5         15.0    False      False  18.99                   50            949.50
1           Export A Medium       white                   25              10.0         12.0    False      False  18.99                   39            740.61
2  Canadian Classics Select       brown                   25              11.1         11.0     True       True  19.09                   38            725.42

CodePudding user response:

let us assume your list of lists is in lst variable, then try this:

flat_list = [item for sublist in t for item in lst]

df = pd.json_normalize(flat_list)

First it flattens the list of lists into a list where each item is a dictionary. Then converts the whole thing into a pandas dataframe.

CodePudding user response:

if you find coprehensions har to read heres whats going on inside:

newlist=[]
for i in data:
     newdict={}
     for j in i:
         for key,item in j.items():
             new_dict[key]=item
     newlist.append(new_dict)
    
df = pd.DataFrame(newlist)
  • Related