I have a dataframe as follows:
import pandas as pd
df = pd.DataFrame({'text':['Lary Page is visiting today',' His boss, Maria Jackson is here.']})
I have extracted the names in the list below. and used faker library to create fake names equal to the len of the person_name list, and created a dictionary out of the lists.
from faker import Faker
fake = Faker()
person_name = ['Lary Page', 'Maria Jackson']
fake_name= [fake.name() for n in range(len(person_name))]
name_dict = dict(zip(person_name, fake_name ))
now I would like to replace them in the dataframe using the dictionary, but it returns an error.
df.text.str.replace(name_dict)
my desired output:(e.g)
print(df)
Angela Mindeston is visiting today
His boss, Emanuel Smith is here.
CodePudding user response:
Use callback with lambda for Series.str.replace
or Series.replace
:
regex = '|'.join(r"\b{}\b".format(x) for x in name_dict.keys())
df['text1'] = df.text.str.replace(regex, lambda x: name_dict[x.group()], regex=True)
df['text2'] = df.text.replace(name_dict, regex=True)
print (df)
text text1 \
0 Lary Page is visiting today Gary Cox is visiting today
1 His boss, Maria Jackson is here. His boss, Mr. George Jones is here.
text2
0 Gary Cox is visiting today
1 His boss, Mr. George Jones is here.