What might be the best way of going about working with the data structure of a list of list of dictionaries like the one I'm working with:
[[{'name': 'Export A Smooth'},
{'filter': 'unfiltered'},
{'number of cigarette': 25},
{'nicotine content': 10.5},
{'tar content': 15.0},
{'menthol': False},
{'king size': False},
{'price': 18.99},
{'units sold per week': 50},
{'profits per week': 949.50}],
[{'name': 'Export A Medium'},
{'filter': 'white'},
{'number of cigarette': 25},
{'nicotine content': 10.0},
{'tar content': 12.0},
{'menthol': False},
{'king size': False},
{'price': 18.99},
{'units sold per week': 39},
{'profits per week': 740.61}],
[{'name': 'Canadian Classics Select'},
{'filter': 'brown'},
{'number of cigarette': 25},
{'nicotine content': 11.1},
{'tar content': 11.0},
{'menthol': True},
{'king size': True},
{'price': 19.09},
{'units sold per week': 38},
{'profits per week': 725.42}]]
and turn it into a structured table format:
name | Filter | Number of Cigarettes |
---|---|---|
Export A Smooth | unfiltered | 25 |
Export A Medium | white | 25 |
Canadian Classics Select | brown | 20 |
I've tried a few different methods for getting the right table format and the table format is correct but there is a lot of NaN
values that pop up for all the cigarettes except the first one (export smooth).
unit | name | filter | profits per week | |
---|---|---|---|---|
1 | Export A Smooth | NaN | ... 900 | NaN |
2 | NaN | unfiltered | ... | NaN |
3 | NaN | NaN | ... | NaN |
4 | NaN | NaN | ... | NaN |
5 | NaN | NaN | ... | NaN |
.. ... | ... | ... | ... | |
155 | NaN | NaN | ... | NaN |
156 | NaN | NaN | ... | NaN |
157 | NaN | NaN | ... | NaN |
158 | NaN | NaN | ... | NaN |
159 | NaN | NaN | ... | 447.72 |
I've tried pd.DataFrame(cig_list).stack().apply(pd.Series)
and pd.concat([pd.DataFrame(ii) for ii in cigarettes])
as well as looping through the cigs and trying to pass them into the DataFrame in that way.
cig_list_items = []
for items in cig_list:
for _ in items:
cig_list_items.append(_)
pd.DataFrame(cig_list_items)
They all return the same result so I figure there must be some issue with the way the dictionaries are formatted? My suspicion is that the dictionaries need to be rearranged so that they read more like this:
[[{'name': 'Export A Smooth'},
{'name': 'Export A Medium'}
{'name': 'Pall Mall Bold'}],
[{'filter': 'unfiltered'},
{'filter': 'white'}
{'filter': 'regular'}]]
CodePudding user response:
Since every entry is a individual dict, you can join them using list dict comprehension:
df = pd.DataFrame([{k: v for d in i for k, v in d.items()} for i in l])
print (df)
name filter number of cigarette nicotine content tar content menthol king size price units sold per week profits per week
0 Export A Smooth unfiltered 25 10.5 15.0 False False 18.99 50 949.50
1 Export A Medium white 25 10.0 12.0 False False 18.99 39 740.61
2 Canadian Classics Select brown 25 11.1 11.0 True True 19.09 38 725.42
CodePudding user response:
let us assume your list of lists is in lst
variable, then try this:
flat_list = [item for sublist in t for item in lst]
df = pd.json_normalize(flat_list)
First it flattens the list of lists into a list where each item is a dictionary. Then converts the whole thing into a pandas dataframe.
CodePudding user response:
if you find coprehensions har to read heres whats going on inside:
newlist=[]
for i in data:
newdict={}
for j in i:
for key,item in j.items():
new_dict[key]=item
newlist.append(new_dict)
df = pd.DataFrame(newlist)