I am using a webservice that returns inference data about submitted images in the form of: {'IMG_123.jpg' : [{'keyword': value, 'score': value}, {'keyword': value, 'score': value}, {'keyword': value, 'score': value}]}
Like this: https://i.stack.imgur.com/FEDqU.png
I want to combine multiple queries into a single dataframe such that the columns are the names of the Images, and the indices are the "keyword" values with the datapoints being the value of the "score".
I have been able to transform the data into, I think, a more useable format using this code:
d={}
for k, v in mydict.items():
d[k] = [{i['keyword']:i['score']} for i in v]
print(pd.DataFrame(d['IMG_1221.JPG']).T)
But this returns: https://i.stack.imgur.com/c3R0l.png
I am not sure how to combine multiple images into the format I am looking for, and the above code does not format my columns in a useful way.
The service returns keyword values that are not consistent across all images, such that the returned list of dicts will be differing sizes and keys. I would like to have a NaN or 0 value for any keys that do not exist for a given image but do for other images in the dataframe.
Any help is much appreciated!
CodePudding user response:
IIUC, you want something like this:
import pandas as pd
mydict = {'IMG_1.JPG': [
{'keyword': 'a', 'score': 1},
{'keyword': 'b', 'score': 2},
{'keyword': 'c', 'score': 3}]}
mydict2 = {'IMG_2.JPG': [
{'keyword': 'a', 'score': 1},
{'keyword': 'b', 'score': 2},
{'keyword': 'd', 'score': 3}]
}
mydicts = [mydict, mydict2]
df_all = pd.DataFrame()
for d in mydicts:
key = list(d.keys())[0]
df = pd.DataFrame(d[key]).set_index('keyword').rename(columns={'score':key})
df_all = pd.concat([df_all, df], axis=1)
print(df_all)
IMG_1.JPG IMG_2.JPG
keyword
a 1.0 1.0
b 2.0 2.0
c 3.0 NaN
d NaN 3.0