I have the following dataframe which contains a string representations of dictionaries in every row of the columns summary_in
and summary out
:
import pandas as pd
df_vals = [[0,
'Person1',
"['xyz', 'abc', 'Jim']",
"['jkl', 'efg', 'Smith']",
1134,
1180,
46,
'sample text',
"{'xyz_key': ['xyz', 756.0], 'abc_key': ['abc', 378.0], 'Jim_key': ['Jim', 0]}",
"{'jkl_key': ['jkl', 395.0], 'efg_key': ['efg', 785.0], 'Smith_key': ['Smith', 0]}"],
[1,
'Person2',
"['lmn', 'opq', 'Mick']",
"['rst', 'uvw', 'Smith']",
1134,
1180,
46,
'sample tex2',
"{'lmn_key': ['lmn', 756.0], 'opq_key': ['opq', 378.0], 'Mick_key': ['Mick', 0]}",
"{'rst_key': ['rst', 395.0], 'uvw_key': ['uvw', 785.0], 'Smith_key': ['Smith', 0]}"]]
df = pd.DataFrame(data=df_vals, columns =['row','Person','in','out','val1','val2','diff','note','summary_in','summary_out'] )
df
What I am trying to do it iterate over every row in the dataframe to print
each key that exists in the summary_in
for each Person
row
After running this code to test datatypes:
#create dict of column
dict_from_dataframe = df['summary_in'].to_dict()
print(type(dict_from_dataframe))
for k in dict_from_dataframe.items():
d = k[1]
print(type(d))
print(d)
I get the following output that shows once i hit the next level, the dictionary (d
)is now a string and cannot be accessed as would normally be with a dictionary:
<class 'dict'>
<class 'str'>
{'xyz_key': ['xyz', 756.0], 'abc_key': ['abc', 378.0], 'Jim_key': ['Jim', 0]}
<class 'str'>
{'lmn_key': ['lmn', 756.0], 'opq_key': ['opq', 378.0], 'Mick_key': ['Mick', 0]}
Any ideas on what I have done wrong here?
My expected output is to loop over the df
to print the following
Person1
xyz_key
abc_key
Jim_key
Person2
lmn_key
opq_key
Mick_key
Any help would be much appreciated! Thanks
CodePudding user response:
IIUC, you could use a custom function. You need to convert the string representation to dictionary with ast.literal_eval
.
from ast import literal_eval
def print_infos(s):
print(s['Person'])
d = literal_eval(s['summary_in'])
for k in d:
print(k)
for _, r in df.iterrows():
print_infos(r)
output:
Person1
xyz_key
abc_key
Jim_key
Person2
lmn_key
opq_key
Mick_key