Home > OS >  pandas split string to multiple columns based on the parts
pandas split string to multiple columns based on the parts

Time:10-11

I have a pandas dataset from that I want to split a single column into multiple columns dynamically but I am getting a ValueError due to static key assignement.

ValueError: Columns must be the same length as key

Part of DF:

amazon_category_and_sub_category
--------------------------------
Hobbies > Model > Rail
NaN
Hobbies > Model > Rail > Trains
Hobbies > Model

What I am doing:

df[['Category', 'Sub_Category_1','Sub_Category_2','Sub_Category_3','Sub_Category_4']] = df['amazon_category_and_sub_category'].str.split('>', expand=True)

CodePudding user response:

If you want to assign with column names you need to be sure of the size.

Here you have 4 columns resulting from the split which you try to assign to 5. This raises the error.

A programmatic variant to assign the first column of the split to "Category" and the subsequent ones to "Sub_Category_n":

df = df.join(df['amazon_category_and_sub_category']
               .str.split('>', expand=True)
               add_prefix('Sub_Category_')
               rename(columns={'Sub_Category_0': 'Category'})
             )

output:

  amazon_category_and_sub_category  Category Sub_Category_1 Sub_Category_2 Sub_Category_3
0           Hobbies > Model > Rail  Hobbies          Model            Rail           None
1                              NaN       NaN            NaN            NaN            NaN
2  Hobbies > Model > Rail > Trains  Hobbies          Model           Rail          Trains
3                  Hobbies > Model  Hobbies           Model           None           None
  • Related