Home > Software design >  Extract Keys from a string representation of dictionaries stored within a pandas dataframe
Extract Keys from a string representation of dictionaries stored within a pandas dataframe

Time:02-19

I have the following dataframe which contains a string representations of dictionaries in every row of the columns summary_in and summary out:

import pandas as pd
    
df_vals = [[0,
  'Person1',
  "['xyz', 'abc', 'Jim']",
  "['jkl', 'efg', 'Smith']",
  1134,
  1180,
  46,
  'sample text',
  "{'xyz_key': ['xyz', 756.0], 'abc_key': ['abc', 378.0], 'Jim_key': ['Jim', 0]}",
  "{'jkl_key': ['jkl', 395.0], 'efg_key': ['efg', 785.0], 'Smith_key': ['Smith', 0]}"],
 [1,
  'Person2',
  "['lmn', 'opq', 'Mick']",
  "['rst', 'uvw', 'Smith']",
  1134,
  1180,
  46,
  'sample tex2',
  "{'lmn_key': ['lmn', 756.0], 'opq_key': ['opq', 378.0], 'Mick_key': ['Mick', 0]}",
  "{'rst_key': ['rst', 395.0], 'uvw_key': ['uvw', 785.0], 'Smith_key': ['Smith', 0]}"]]

df = pd.DataFrame(data=df_vals, columns =['row','Person','in','out','val1','val2','diff','note','summary_in','summary_out'] )
df

enter image description here

What I am trying to do it iterate over every row in the dataframe to print each key that exists in the summary_in for each Person row

After running this code to test datatypes:

#create dict of column
dict_from_dataframe = df['summary_in'].to_dict()
print(type(dict_from_dataframe))

for k in dict_from_dataframe.items():
    d = k[1]
    print(type(d))
    print(d)

I get the following output that shows once i hit the next level, the dictionary (d)is now a string and cannot be accessed as would normally be with a dictionary:

<class 'dict'>
<class 'str'>
{'xyz_key': ['xyz', 756.0], 'abc_key': ['abc', 378.0], 'Jim_key': ['Jim', 0]}
<class 'str'>
{'lmn_key': ['lmn', 756.0], 'opq_key': ['opq', 378.0], 'Mick_key': ['Mick', 0]}

Any ideas on what I have done wrong here?

My expected output is to loop over the df to print the following

Person1
xyz_key
abc_key
Jim_key
Person2
lmn_key
opq_key
Mick_key

Any help would be much appreciated! Thanks

CodePudding user response:

IIUC, you could use a custom function. You need to convert the string representation to dictionary with ast.literal_eval.

from ast import literal_eval

def print_infos(s):
    print(s['Person'])
    d = literal_eval(s['summary_in'])
    for k in d:
        print(k)

for _, r in df.iterrows():
    print_infos(r)

output:

Person1
xyz_key
abc_key
Jim_key
Person2
lmn_key
opq_key
Mick_key
  • Related