Home > database >  Extract key value pairs from dict in pandas column using list items in another column
Extract key value pairs from dict in pandas column using list items in another column

Time:09-29

Trying to create a new column that is the key/value pairs extracted from a dict in another column using list items in a second column.

Sample Data:

names             name_dicts
['Mary', 'Joe']   {'Mary':123, 'Ralph':456, 'Joe':789}

Expected Result:

names             name_dicts                              new_col
['Mary', 'Joe']   {'Mary':123, 'Ralph':456, 'Joe':789}    {'Mary':123, 'Joe':789}

I have attempted to use AST to convert the name_dicts column to a column of true dictionaries.

This function errored out with a "cannot convert string" error.

col here is the df['name_dicts'] col

def get_name_pairs(col):
    for k,v in col.items():
        if k.isin(df['names']):
            return 

CodePudding user response:

Using a list comprehension and operator.itemgetter:

from operator import itemgetter

df['new_col'] = [dict(zip(l, itemgetter(*l)(d)))
                 for l,d in zip(df['names'], df['name_dicts'])]

output:

         names                               name_dicts                    new_col
0  [Mary, Joe]  {'Mary': 123, 'Ralph': 456, 'Joe': 789}  {'Mary': 123, 'Joe': 789}

used input:

df = pd.DataFrame({'names': [['Mary', 'Joe']],
                   'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}]
                  })

CodePudding user response:

You can apply a lambda function with dictionary comprehension at row level to get the values from the dict in second column based on the keys in the list of first column:

# If col values are stored as string:
import ast
for col in df:
    df[col] = df[col].apply(ast.literal_eval)

df['new_col']=df.apply(lambda x: {k:x['name_dicts'].get(k,0) for k in x['names']},
                         axis=1)

# Replace above lambda by
# lambda x: {k:x['name_dicts'][k] for k in x['names']  if k in x['name_dicts']}
# If you want to include only key/value pairs for the key that is in
# both the list and the dictionary

         names  ...                    new_col
0  [Mary, Joe]  ...  {'Mary': 123, 'Joe': 789}
[1 rows x 3 columns]

PS: ast.literal_eval runs without error for the sample data you have posted for above code.

CodePudding user response:

Your function needs only small change - and you can use it with .apply()

import pandas as pd

df = pd.DataFrame({
    'names': [['Mary', 'Joe']],
    'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}],
})

def filter_data(row):
    result = {}
    for key, val in row['name_dicts'].items():
        if key in row['names']:
            result[key] = val
    return result

df['new_col'] = df.apply(filter_data, axis=1)

print(df.to_string())

Result:

         names                               name_dicts                    new_col
0  [Mary, Joe]  {'Mary': 123, 'Ralph': 456, 'Joe': 789}  {'Mary': 123, 'Joe': 789}

EDIT:

If you have string "{'Mary':123, 'Ralph':456, 'Joe':789}" in name_dicts then you can replace ' with " and you will have json which you can convert to dictionary using json.loads

import json
df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)

Or directly convert it as Python's code:

import ast

df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)

Eventually:

df['name_dicts'] = df['name_dicts'].apply(eval)

Full code:

import pandas as pd

df = pd.DataFrame({
    'names': [['Mary', 'Joe']],
    'name_dicts': ["{'Mary':123, 'Ralph':456, 'Joe':789}",],  # strings
})

#import json
#df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)

#df['name_dicts'] = df['name_dicts'].apply(eval)

import ast
df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)

def filter_data(row):
    result = {}
    for key, val in row['name_dicts'].items():
        if key in row['names']:
            result[key] = val
    return result

df['new_col'] = df.apply(filter_data, axis=1)

print(df.to_string())
  • Related