creating a python dictionary from two pandas dataframe-CodePudding

I am trying to create a dictionary from two pandas dataframe following is a snapshot the dataframe which suppose to hold the keys:

C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000007.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg
C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000012.jpg

And the following datarame snapshot is values for the dictionary:

324,339,263,211,9
253,372,165,264,9
67,374,5,244,9
295,299,241,194,9

so I want to append each two rows togather as a key and value in one dictionary This is what I tried:

import pandas as pd
import numpy as np
image_files=pd.read_csv('image_files.csv')
file = pd.read_csv('Training_dataset.csv')

image_anno_dict={}

for image_file, row in zip(image_files,file.iterrows()):
    image_anno_dict[image_file]=np.array(row)

my expected output:

{'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [324,339,263,211,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [253,372,165,264,9]
'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg': [67,374,5,244,9]
.
.
.
}

But the code work only for the first row, Any suggestion for a solution?

print(image_files.head(5)):

C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg
0  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
1  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
2  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
3  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...                         
4  C:/Users/Yaman/PycharmProjects/Mindsporeprojec...

print(file.head(5)):

     0    1    2    3  4
0  324  339  263  211  9
1  253  372  165  264  9
2   67  374    5  244  9
3  295  299  241  194  9
4  312  220  277  186  9

CodePudding user response：

You can use pandas Series to combined two dataframes and then convert it by calling to_dict method. Here is working sample code

import pandas as pd

 
df1 = pd.DataFrame({'df1Keys':['ab','bc','c','df','efg']})
df2 = pd.DataFrame({'df2Vlues':[1,25,3,84,545]})

#method 1
print(pd.Series(df2.df2Vlues.values,index=df1.df1Keys).to_dict())

#method 2
print(dict(zip(df1.df1Keys,df2.df2Vlues)))

CodePudding user response：

import pandas as pd
import numpy as np

image_files = pd.read_csv('image_files.csv', header=None)
file = pd.read_csv('Training_dataset.csv')

image_anno_list = list(zip(image_files[0], file.apply(np.array, axis=1)))

Output:

>>> image_anno_list

[('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([324, 339, 263, 211,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([253, 372, 165, 264,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([ 67, 374,   5, 244,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([295, 299, 241, 194,   9])),
 ('C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg',
  array([312, 220, 277, 186,   9]))]

If you use a dict, you will get this:

image_anno_dict = dict(zip(image_files[0], file.apply(np.array, axis=1)))

>>> image_anno_dict

{'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\\000005.jpg':
 array([312, 220, 277, 186,   9])}

CodePudding user response：

You can create dictionary with collections.defaultdict with a list default like below:

from collections import defaultdict
import pandas as pd
import numpy as np

image_files=pd.read_csv('image_files.csv')
file = pd.read_csv('Training_dataset.csv')

image_anno_dict=defaultdict(list)

for image_file, row in zip(image_files,file.iterrows()):
    image_anno_dict[image_file].append(np.array(row))

Output:

{'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000005.jpg' :
 [
    [324,339,263,211,9], [253,372,165,264,9] , [67,374,5,244,9], ...
 ]
 ,
 ...
 , 
 'C:/Users/Yaman/PycharmProjects/Mindsporeproject/JPEGImages_train\000009.jpg' : 
 [
     [253,372,165,264,9] , [67,374,5,244,9], ...
 ], 
 ...
}