Home > database >  Convert dictionary with both nested and non-nested items into dataframe
Convert dictionary with both nested and non-nested items into dataframe

Time:11-06

I have a following dictionary:

{'instance_1': {'race': {'asian': 0,
   'black': 99,
   'white': 9},
  'dominant_race': 'black',
  'region': {'x': 0, 'y': 0, 'w': 100, 'h': 100}},
 'instance_2': {'race': {'asian': 0,
   'black': 0,
   'white': 89},
  'dominant_race': 'white',
  'region': {'x': 6, 'y': 12, 'w': 79, 'h': 79}}}

and would like to convert it into pandas data frame such that each element within each item is its own column, like

item asian black white dominant_race x y w h
instance_1 0 99 9 black 0 0 100 100
instance_2 0 0 89 white 6 12 79 79

Using pd.DataFrame.to_dict() with orient=index yields the following result

pd.DataFrame.from_dict(predictions, orient='index')

            race                                 dominant_race  \
instance_1  {'asian': 0, 'black': 99....         black   
instance_2  {'asian': 0, 'black': 0.....         white   

                                            region  
instance_1    {'x': 0, 'y': 0, 'w': 100, 'h': 100}  
instance_2     {'x': 6, 'y': 12, 'w': 79, 'h': 79}  

How do I convert the dictionary so that each element within 'race' and 'region' are its own columns?

CodePudding user response:

.apply(pd.Series) will break the dictionaries within your cells to different columns.

Assuming that the sample above is a good representation of your real data, you can first use pd.DataFrame().T , and then use concat to combine the newly formed columns from region and race:

df = pd.DataFrame(d).T # d being your sample dictionary

res = pd.concat([df['race'].apply(pd.Series),
               df['dominant_race'],
               df['region'].apply(pd.Series)], axis=1)

Which will print:

res
Out[262]: 
            asian  black  white dominant_race  x   y    w    h
instance_1      0     99      9         black  0   0  100  100
instance_2      0      0     89         white  6  12   79   79
  • Related