Extract key value pairs from dict in pandas column using list items in another column-CodePudding

Trying to create a new column that is the key/value pairs extracted from a dict in another column using list items in a second column.

Sample Data:

names             name_dicts
['Mary', 'Joe']   {'Mary':123, 'Ralph':456, 'Joe':789}

Expected Result:

names             name_dicts                              new_col
['Mary', 'Joe']   {'Mary':123, 'Ralph':456, 'Joe':789}    {'Mary':123, 'Joe':789}

I have attempted to use AST to convert the name_dicts column to a column of true dictionaries.

This function errored out with a "cannot convert string" error.

col here is the df['name_dicts'] col

def get_name_pairs(col):
    for k,v in col.items():
        if k.isin(df['names']):
            return

CodePudding user response：

Using a list comprehension and operator.itemgetter:

from operator import itemgetter

df['new_col'] = [dict(zip(l, itemgetter(*l)(d)))
                 for l,d in zip(df['names'], df['name_dicts'])]

output:

         names                               name_dicts                    new_col
0  [Mary, Joe]  {'Mary': 123, 'Ralph': 456, 'Joe': 789}  {'Mary': 123, 'Joe': 789}

used input:

df = pd.DataFrame({'names': [['Mary', 'Joe']],
                   'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}]
                  })

CodePudding user response：

You can apply a lambda function with dictionary comprehension at row level to get the values from the dict in second column based on the keys in the list of first column:

# If col values are stored as string:
import ast
for col in df:
    df[col] = df[col].apply(ast.literal_eval)

df['new_col']=df.apply(lambda x: {k:x['name_dicts'].get(k,0) for k in x['names']},
                         axis=1)

# Replace above lambda by
# lambda x: {k:x['name_dicts'][k] for k in x['names']  if k in x['name_dicts']}
# If you want to include only key/value pairs for the key that is in
# both the list and the dictionary

         names  ...                    new_col
0  [Mary, Joe]  ...  {'Mary': 123, 'Joe': 789}
[1 rows x 3 columns]

PS: ast.literal_eval runs without error for the sample data you have posted for above code.

CodePudding user response：

Your function needs only small change - and you can use it with .apply()

import pandas as pd

df = pd.DataFrame({
    'names': [['Mary', 'Joe']],
    'name_dicts': [{'Mary':123, 'Ralph':456, 'Joe':789}],
})

def filter_data(row):
    result = {}
    for key, val in row['name_dicts'].items():
        if key in row['names']:
            result[key] = val
    return result

df['new_col'] = df.apply(filter_data, axis=1)

print(df.to_string())

Result:

         names                               name_dicts                    new_col
0  [Mary, Joe]  {'Mary': 123, 'Ralph': 456, 'Joe': 789}  {'Mary': 123, 'Joe': 789}

EDIT:

If you have string "{'Mary':123, 'Ralph':456, 'Joe':789}" in name_dicts then you can replace ' with " and you will have json which you can convert to dictionary using json.loads

import json
df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)

Or directly convert it as Python's code:

import ast

df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)

Eventually:

df['name_dicts'] = df['name_dicts'].apply(eval)

Full code:

import pandas as pd

df = pd.DataFrame({
    'names': [['Mary', 'Joe']],
    'name_dicts': ["{'Mary':123, 'Ralph':456, 'Joe':789}",],  # strings
})

#import json
#df['name_dicts'] = df['name_dicts'].str.replace("'", '"').apply(json.loads)

#df['name_dicts'] = df['name_dicts'].apply(eval)

import ast
df['name_dicts'] = df['name_dicts'].apply(ast.literal_eval)

def filter_data(row):
    result = {}
    for key, val in row['name_dicts'].items():
        if key in row['names']:
            result[key] = val
    return result

df['new_col'] = df.apply(filter_data, axis=1)

print(df.to_string())