Home > front end >  Split pandas dataframe column based on the pipeline symbol
Split pandas dataframe column based on the pipeline symbol

Time:05-25

I have a pandas data frame which has a single column named Category. I want to split this Category column into 4 separate columns named A, B, C, D based on the pipeline symbol "||"

Sample input: df['Category'] = Operations||Modification||Bank||Bank Process

Sample output:

df['A'] = Operations

df['B'] = Modification

df['C'] = Bank

df['D'] = Bank Process

I have looked up many answers on stack overflow but none are working for me. I have tried the following code:

df[['A', 'B', 'C', 'D']] = df['Category'].str.split("||", expand = True)

But it gives the error: Exception has occurred: ValueError Columns must be same length as key

CodePudding user response:

Presumably your version of Pandas is running str.split with regex mode enabled. In that case, you would need to escape the pipes:

df[["A", "B", "C", "D"]] = df["Category"].str.split(r'\|\|', expand=True)

Or, you also could explicitly turn off regex mode:

df[["A", "B", "C", "D"]] = df["Category"].str.split("||", expand=True, regex=False)

CodePudding user response:

The pipe has a special meaning in regex, you need to escape it or declare that you are not using a regex by setting regex=False in str.split:

df[["A", "B", "C", "D"]] = df["Category"].str.split('||', regex=False,
                                                    expand=True)
  • Related