Dictionary Values in Data Frames-CodePudding

Attempting to insert a dictionary value into a separate column in a data frame, if the existing data frame column contains a dictionary's key. I have tried the code below, but get returning [] for value pairs:

import pandas as pd
import numpy as np

df = pd.DataFrame({'key' : ["vs, vscode", "jupyter, jupyterlab", "python, vs", "python", "it was spyder before dawn"]})

my_dict = {'vscode' : 'is gross',
           'jupyter' : 'is not so awesome, but hes ok, ig',
           'vs' : 'is awesome',
           'jupyterlab' : 'is rad',
          'python' : "booya"}

def cascade_col(row_value):

    cvc_row = []
    for word in row_value:
        if word in my_dict:
            cvc_row.append(my_dict[word])
    return cvc_row

df['dict value'] = df['key'].apply(cascade_col)

print(df)

My expected output is the following:

df = pd.DataFrame({'key' : ["vs, vscode", "jupyter, jupyterlab", "python, vs", "python", "it was spyder before dawn"],
                           'Corresponding Value(s)' : ['is awesome, is gross', 'is not so awesome, but hes ok, ig, is rad', 'booya, is awesome', 'booya', np.nan]})
df

Thank you for taking my question.

I have attempted a solution to this, but am stuck. I have defined my problem, the code I've tried, but am looking for further assistance. Thank you.

CodePudding user response：

Code:

def cascade_col(row_value):
    cvc_row = []
    for word in row_value.split(','):  
        word =word.strip()
        if word in my_dict:
            cvc_row.append(my_dict[word])
    return ','.join(cvc_row)

Using lambda

df['Corresponding Value(s)'] = df['key'].apply(lambda row: ','.join([my_dict[i] for i in [l.strip() for l in row.split(',')]if i in my_dict]))

CodePudding user response：

You can use regex extraction and mapping with the dictionary:

import re

regex = '|'.join(map(re.escape, my_dict))
df['dict value'] = (df['key'].str.extractall(f'({regex})')[0]
                    .map(my_dict)
                    .groupby(level=0).agg(', '.join)
                   )

Output:

                         key                                                            dict value
0                 vs, vscode                                                  is awesome, is gross
1        jupyter, jupyterlab  is not so awesome, but hes ok, ig, is not so awesome, but hes ok, ig
2                 python, vs                                                     booya, is awesome
3                     python                                                                 booya
4  it was spyder before dawn                                                                   NaN

CodePudding user response：

A few changes to the function were necessary. First we need to convert the values in the row into a list. Otherwise we cannot iterate. In the expected output, new lines are requested in string type, so we made a change in the return part and converted the list to a string expression.

import numpy as np
def cascade_col(row_value):

    cvc_row = []
    for word in list(row_value.split(", ")): # ----> string to list
        if word in list(my_dict.keys()):  # ---- > dictionary keys to list
            cvc_row.append(my_dict[word])
    return ','.join(cvc_row) # ---- > list to string

df['dict_value'] = df['key'].apply(lambda x: cascade_col(x)).replace("",np.nan) # fill empty rows with nan

output:

    key                        dict_value
0   vs, vscode                 is awesome,is gross
1   jupyter, jupyterlab        is not so awesome, but hes ok, ig,is rad
2   python, vs                 booya,is awesome
3   python                     booya
4   it was spyder before dawn  nan