Attempting to insert a dictionary value into a separate column in a data frame, if the existing data frame column contains a dictionary's key. I have tried the code below, but get returning []
for value pairs:
import pandas as pd
import numpy as np
df = pd.DataFrame({'key' : ["vs, vscode", "jupyter, jupyterlab", "python, vs", "python", "it was spyder before dawn"]})
my_dict = {'vscode' : 'is gross',
'jupyter' : 'is not so awesome, but hes ok, ig',
'vs' : 'is awesome',
'jupyterlab' : 'is rad',
'python' : "booya"}
def cascade_col(row_value):
cvc_row = []
for word in row_value:
if word in my_dict:
cvc_row.append(my_dict[word])
return cvc_row
df['dict value'] = df['key'].apply(cascade_col)
print(df)
My expected output is the following:
df = pd.DataFrame({'key' : ["vs, vscode", "jupyter, jupyterlab", "python, vs", "python", "it was spyder before dawn"],
'Corresponding Value(s)' : ['is awesome, is gross', 'is not so awesome, but hes ok, ig, is rad', 'booya, is awesome', 'booya', np.nan]})
df
Thank you for taking my question.
I have attempted a solution to this, but am stuck. I have defined my problem, the code I've tried, but am looking for further assistance. Thank you.
CodePudding user response:
Code:
def cascade_col(row_value):
cvc_row = []
for word in row_value.split(','):
word =word.strip()
if word in my_dict:
cvc_row.append(my_dict[word])
return ','.join(cvc_row)
Using lambda
df['Corresponding Value(s)'] = df['key'].apply(lambda row: ','.join([my_dict[i] for i in [l.strip() for l in row.split(',')]if i in my_dict]))
CodePudding user response:
You can use regex extraction and mapping with the dictionary:
import re
regex = '|'.join(map(re.escape, my_dict))
df['dict value'] = (df['key'].str.extractall(f'({regex})')[0]
.map(my_dict)
.groupby(level=0).agg(', '.join)
)
Output:
key dict value
0 vs, vscode is awesome, is gross
1 jupyter, jupyterlab is not so awesome, but hes ok, ig, is not so awesome, but hes ok, ig
2 python, vs booya, is awesome
3 python booya
4 it was spyder before dawn NaN
CodePudding user response:
A few changes to the function were necessary. First we need to convert the values in the row into a list. Otherwise we cannot iterate. In the expected output, new lines are requested in string type, so we made a change in the return part and converted the list to a string expression.
import numpy as np
def cascade_col(row_value):
cvc_row = []
for word in list(row_value.split(", ")): # ----> string to list
if word in list(my_dict.keys()): # ---- > dictionary keys to list
cvc_row.append(my_dict[word])
return ','.join(cvc_row) # ---- > list to string
df['dict_value'] = df['key'].apply(lambda x: cascade_col(x)).replace("",np.nan) # fill empty rows with nan
output:
key dict_value
0 vs, vscode is awesome,is gross
1 jupyter, jupyterlab is not so awesome, but hes ok, ig,is rad
2 python, vs booya,is awesome
3 python booya
4 it was spyder before dawn nan