Home > Net >  How to make different columns for each elements in a list?
How to make different columns for each elements in a list?

Time:12-27

I have a pandas dataframe column that contains list of strings (lengths are different) like below: df['category']:

category                                                                           | ...
---------
['Grocery & Gourmet Food', 'Cooking & Baking', 'Lard & Shortening', 'Shortening']  | ...
['Grocery & Gourmet Food', 'Candy & Chocolate', 'Mints']                           | ...
['Grocery & Gourmet Food', 'Soups, Stocks & Broths', 'Broths', 'Chicken']          | ...

Now, I want to break this category column into different columns for each string element in the list. Is it possible to do using pandas? How I am gonna handle the column names?

I have gone through the answers of this question, but the difference is my list lengths are not the same always.

My expected output would be something like below:

category_1             | category_2       |  category_n  | other_columns 
------------------------------------------------------------------
Grocery & Gourmet Food | Cooking & Baking | Lard & Shortening | ...
...                    | ...              | ...               | ...

CodePudding user response:

I would do something like this:

df2 = pd.DataFrame(df['category'].to_list(), columns=[f"category_{i 1}" for i in range(len(df['category'].max()))])
df = pd.concat([df.drop('category', axis=1), df2], axis=1)

Output:

               category_1              category_2         category_3  \
0  Grocery & Gourmet Food        Cooking & Baking  Lard & Shortening   
1  Grocery & Gourmet Food       Candy & Chocolate              Mints   
2  Grocery & Gourmet Food  Soups, Stocks & Broths             Broths   

   category_4  
0  Shortening  
1        None  
2     Chicken 

Edit:

As @mozway suggested, it is better to create the columns with their default names and then update them:

df2 = pd.DataFrame(df['category'].to_list())
df2.columns = df2.columns.map(lambda x: f'category_{x 1}')
df = pd.concat([df.drop('category', axis=1), df2], axis=1)
  • Related