Home > Blockchain >  Create Pandas dataframe from list which contains dictionary of dictionary
Create Pandas dataframe from list which contains dictionary of dictionary

Time:02-11

I have a list of dictionaries that each contain dictionary key:value pairs as the value - see below:

d = [{'line': {'Area Boundary Must Be Covered By Boundary Of': '10', 'Must Be Inside': '55', 'Must Not Have Gaps': '2', 'Must Not Self-Intersect': '2', 'Must Not Self-Overlap': '2'}},
     {'point': {'Must Not Self-Intersect': '3'}}, 
     {'poly': {'Must Not Overlap': '2'}}]

The desired dataframe form would be:

desired dataframe form

I've been creating test dataframes for a bit, and can't seem to wrangle it to the form above.

Some notes - these errors will be dynamic. Meaning, this script will run at a weekly interval, and as the source data changes, so will the errors. The only constants will be the type geometry (i.e 'line' 'point' and 'poly').

edit:

d = [{'line': {'Area Boundary Must Be Covered By Boundary Of': '10', 'Must Be Inside': '55', 'Must Not Have Gaps': '2', 'Must Not Self-Intersect': '2', 'Must Not Self-Overlap': '2'}},
         {'point': {'Must Not Self-Intersect': '3'}}, 
         {'poly': {'Must Not Overlap': '2'}}]
df = pandas.concat([pandas.DataFrame(x) for x in d])
print(df)

Produces:

                                             line point poly
Area Boundary Must Be Covered By Boundary Of   10   NaN  NaN
Must Be Inside                                 55   NaN  NaN
Must Not Have Gaps                              2   NaN  NaN
Must Not Self-Intersect                         2   NaN  NaN
Must Not Self-Overlap                           2   NaN  NaN
Must Not Self-Intersect                       NaN     3  NaN
Must Not Overlap                              NaN   NaN    2

This will suffice.

CodePudding user response:

You can create a dataframe for each sub-object, concat them, and then compress the duplicates together with groupby(level=0) (for 0th index level) sum:

df = pd.concat([pd.DataFrame(o) for o in objects]).groupby(level=0).sum().T

Output:

>>> df
      Area Boundary Must Be Covered By Boundary Of Must Be Inside Must Not Have Gaps Must Not Overlap Must Not Self-Intersect Must Not Self-Overlap
line                                            10             55                  2                0                       2                     2
point                                            0              0                  0                0                       3                     0
poly                                             0              0                  0                2                       0                     0
  • Related