Home > Software engineering >  Trying to combine multiple lists of dictionaries into single Pandas DataFrame
Trying to combine multiple lists of dictionaries into single Pandas DataFrame

Time:09-15

I am using a webservice that returns inference data about submitted images in the form of: {'IMG_123.jpg' : [{'keyword': value, 'score': value}, {'keyword': value, 'score': value}, {'keyword': value, 'score': value}]}

Like this: https://i.stack.imgur.com/FEDqU.png

I want to combine multiple queries into a single dataframe such that the columns are the names of the Images, and the indices are the "keyword" values with the datapoints being the value of the "score".

I have been able to transform the data into, I think, a more useable format using this code:

d={}
for k, v in mydict.items():
    d[k] = [{i['keyword']:i['score']} for i in v]
    
print(pd.DataFrame(d['IMG_1221.JPG']).T)

But this returns: https://i.stack.imgur.com/c3R0l.png

I am not sure how to combine multiple images into the format I am looking for, and the above code does not format my columns in a useful way.

The service returns keyword values that are not consistent across all images, such that the returned list of dicts will be differing sizes and keys. I would like to have a NaN or 0 value for any keys that do not exist for a given image but do for other images in the dataframe.

Any help is much appreciated!

CodePudding user response:

IIUC, you want something like this:

import pandas as pd

mydict = {'IMG_1.JPG': [
    {'keyword': 'a', 'score': 1},
    {'keyword': 'b', 'score': 2},
    {'keyword': 'c', 'score': 3}]}

mydict2 = {'IMG_2.JPG': [
    {'keyword': 'a', 'score': 1},
    {'keyword': 'b', 'score': 2},
    {'keyword': 'd', 'score': 3}]
    }

mydicts = [mydict, mydict2]

df_all = pd.DataFrame()

for d in mydicts:
    key = list(d.keys())[0]
    df = pd.DataFrame(d[key]).set_index('keyword').rename(columns={'score':key})
    df_all = pd.concat([df_all, df], axis=1)

print(df_all)

         IMG_1.JPG  IMG_2.JPG
keyword                      
a              1.0        1.0
b              2.0        2.0
c              3.0        NaN
d              NaN        3.0
  • Related