I have a list called names
names = ['kramer hickok', 'carlos ortiz ', 'talor gooch', 'mikumu horikawa', 'yoshinori fujimoto']
In addition, I have a pandas.DataFrame
called page. The dataframe looks as follows:
name
-- ---------------------------
0 kramer hickok united states
1 carlos ortiz mexico
2 talor gooch united states
3 mikumu horikawa japan
4 yoshinori fujimoto japan
I want to replace all the countries from the column. How can I do this as fast as possible?
The desired output:
name
-- ---------------------------
0 kramer hickok
1 carlos ortiz
2 talor gooch
3 mikumu horikawa
4 yoshinori fujimoto
I tried the following with no result:
for name in names:
page['name'] = page['name'].str.extract(name)
Thank you
CodePudding user response:
You can try .str.extract
page['out'] = page['name'].str.extract(r'\b(' '|'.join(names) r')\b')
print(page)
name out
0 kramer hickok united states kramer hickok
1 carlos ortiz mexico carlos ortiz
2 talor gooch united states talor gooch
3 mikumu horikawa japan mikumu horikawa
4 yoshinori fujimoto japan yoshinori fujimoto
5 mikumumikumu horikawa japan NaN
CodePudding user response:
How about just replacing the column at all?
page['name'] = names
I think it would take less time and much easier to handle.
( ※ Note that there should be no duplicate in the names if using this code.)
CodePudding user response:
If every name is just two words, you don't even need your list:
df.name = df.name.str.extract('(\w \w )')
print(df)
# Output:
name
0 kramer hickok
1 carlos ortiz
2 talor gooch
3 mikumu horikawa
4 yoshinori fujimoto