I would like to split the column in this dataframe on name into two columns,
import pandas as pd
import re
df1 = pd.DataFrame({'Name': ['Steve _ (ICANA) Smith', 'Joe _ (ICANA) Nadal',
'Roger _ (ICANA) Federer_blu']})
My desired output would be:
Name First Last
0 Steve _ (ICANA) Smith Steve Smith
1 Joe _ (ICANA) Nadal Joe Nadal
2 Roger _ (ICANA) Federer_blu Roger Federer_blu
so I would like to get rid of ' _ (ICANA)'. using split, I have done,
df1[['First','last']] = df1.Name.str.split(r"\b _ (ICANA)\b", expand=True)
which returns the following error,
ValueError: Columns must be same length as key
CodePudding user response:
You regex was incorrect, you need to escape the parentheses:
df1[['First','last']] = df1.Name.str.split(r"\s*_ \(ICANA\)\s*", expand=True)
However, to avoid any issue, the best remains to use extract
, which will ensure a fixed number of output columns:
df1[['First','last']] = df1.Name.str.extract(r'(\w*)\s*_ \(ICANA\)\s*(\w*)')