I have a pandas dataframe column that contains list of strings (lengths are different) like below:
df['category']
:
category | ...
---------
['Grocery & Gourmet Food', 'Cooking & Baking', 'Lard & Shortening', 'Shortening'] | ...
['Grocery & Gourmet Food', 'Candy & Chocolate', 'Mints'] | ...
['Grocery & Gourmet Food', 'Soups, Stocks & Broths', 'Broths', 'Chicken'] | ...
Now, I want to break this category column into different columns for each string element in the list. Is it possible to do using pandas? How I am gonna handle the column names?
I have gone through the answers of this question, but the difference is my list lengths are not the same always.
My expected output would be something like below:
category_1 | category_2 | category_n | other_columns
------------------------------------------------------------------
Grocery & Gourmet Food | Cooking & Baking | Lard & Shortening | ...
... | ... | ... | ...
CodePudding user response:
I would do something like this:
df2 = pd.DataFrame(df['category'].to_list(), columns=[f"category_{i 1}" for i in range(len(df['category'].max()))])
df = pd.concat([df.drop('category', axis=1), df2], axis=1)
Output:
category_1 category_2 category_3 \
0 Grocery & Gourmet Food Cooking & Baking Lard & Shortening
1 Grocery & Gourmet Food Candy & Chocolate Mints
2 Grocery & Gourmet Food Soups, Stocks & Broths Broths
category_4
0 Shortening
1 None
2 Chicken
Edit:
As @mozway suggested, it is better to create the columns with their default names and then update them:
df2 = pd.DataFrame(df['category'].to_list())
df2.columns = df2.columns.map(lambda x: f'category_{x 1}')
df = pd.concat([df.drop('category', axis=1), df2], axis=1)