I have a pandas dataset from that I want to split a single column into multiple columns dynamically but I am getting a ValueError due to static key assignement.
ValueError: Columns must be the same length as key
Part of DF:
amazon_category_and_sub_category
--------------------------------
Hobbies > Model > Rail
NaN
Hobbies > Model > Rail > Trains
Hobbies > Model
What I am doing:
df[['Category', 'Sub_Category_1','Sub_Category_2','Sub_Category_3','Sub_Category_4']] = df['amazon_category_and_sub_category'].str.split('>', expand=True)
CodePudding user response:
If you want to assign with column names you need to be sure of the size.
Here you have 4 columns resulting from the split
which you try to assign to 5. This raises the error.
A programmatic variant to assign the first column of the split to "Category" and the subsequent ones to "Sub_Category_n":
df = df.join(df['amazon_category_and_sub_category']
.str.split('>', expand=True)
add_prefix('Sub_Category_')
rename(columns={'Sub_Category_0': 'Category'})
)
output:
amazon_category_and_sub_category Category Sub_Category_1 Sub_Category_2 Sub_Category_3
0 Hobbies > Model > Rail Hobbies Model Rail None
1 NaN NaN NaN NaN NaN
2 Hobbies > Model > Rail > Trains Hobbies Model Rail Trains
3 Hobbies > Model Hobbies Model None None