Home > Blockchain >  pandas: split a column on multiple words present in all cells
pandas: split a column on multiple words present in all cells

Time:02-19

I would like to split the column in this dataframe on name into two columns,

import pandas as pd
import re
df1 = pd.DataFrame({'Name': ['Steve _ (ICANA) Smith', 'Joe _ (ICANA) Nadal',
                       'Roger _ (ICANA) Federer_blu']})

My desired output would be:

                          Name  First    Last
0        Steve _ (ICANA) Smith  Steve    Smith
1          Joe _ (ICANA) Nadal  Joe      Nadal
2  Roger _ (ICANA) Federer_blu  Roger    Federer_blu

so I would like to get rid of ' _ (ICANA)'. using split, I have done,

df1[['First','last']] = df1.Name.str.split(r"\b _ (ICANA)\b", expand=True)

which returns the following error,

ValueError: Columns must be same length as key

CodePudding user response:

You regex was incorrect, you need to escape the parentheses:

df1[['First','last']] = df1.Name.str.split(r"\s*_ \(ICANA\)\s*", expand=True)

However, to avoid any issue, the best remains to use extract, which will ensure a fixed number of output columns:

df1[['First','last']] = df1.Name.str.extract(r'(\w*)\s*_ \(ICANA\)\s*(\w*)')
  • Related