I have a pandas data frame which has a single column named Category. I want to split this Category column into 4 separate columns named A, B, C, D based on the pipeline symbol "||"
Sample input: df['Category'] = Operations||Modification||Bank||Bank Process
Sample output:
df['A'] = Operations
df['B'] = Modification
df['C'] = Bank
df['D'] = Bank Process
I have looked up many answers on stack overflow but none are working for me. I have tried the following code:
df[['A', 'B', 'C', 'D']] = df['Category'].str.split("||", expand = True)
But it gives the error: Exception has occurred: ValueError Columns must be same length as key
CodePudding user response:
Presumably your version of Pandas is running str.split
with regex mode enabled. In that case, you would need to escape the pipes:
df[["A", "B", "C", "D"]] = df["Category"].str.split(r'\|\|', expand=True)
Or, you also could explicitly turn off regex mode:
df[["A", "B", "C", "D"]] = df["Category"].str.split("||", expand=True, regex=False)
CodePudding user response:
The pipe has a special meaning in regex, you need to escape it or declare that you are not using a regex by setting regex=False
in str.split
:
df[["A", "B", "C", "D"]] = df["Category"].str.split('||', regex=False,
expand=True)