Home > Software engineering >  How to split a column value at comma into multiple columns and rename them as its number of column a
How to split a column value at comma into multiple columns and rename them as its number of column a

Time:12-20

I am using python and I have a dataframe named "noshow" with 5 columns such as ,

  1. Phone - float64
  2. Name - object
  3. Total Order Counts - int64
  4. Total Bill Value - float64
  5. List_of_Order_Id - object

5th column "List_of_Order_Id" has the list of order ids per row(customer) like below,

In [11]: noshow
Out[11]: 
           Mobile  ...                  List_of_Order_Id
0    9.163820e 08  ...                             21810
1    9.179049e 08  ...                             23387
2    9.183748e 08  ...                             21767
3    9.186110e 08  ...                             23457
4    9.187790e 08  ...                       23117,23163
..            ...  ...                               ...
353  9.970647e 09  ...                             21549
354  9.971940e 09  ...                             22753
355  9.994742e 09  ...     21505,21836,22291,22539,22734
356  9.994964e 09  ...                             22348
357  9.994997e 09  ...                       21100,21550

[358 rows x 5 columns]

Now, I want to automate the splitting of "List_of_Order_Id" column at each comma and create a new column for each of the value using loop or whatever solution available.

For example, if in one of the row of List_of_Order_Id column out of 358 rows has "n" order id values and that is the maximum number of order ids per customer then I want to split it into "n" columns and I want to name each of the columns as "Order_Id_1", "Order_Id_2", ......., "Order_Id_n" with the numbers as suffix like mentioned.

Kindly help! Thanks in advance.

CodePudding user response:

You can use str.split to split the strings in the column and then attach the resulting DataFrame to the original DataFrame, assigning column names using its width.

temp = df['List_of_Order_Id'].str.split(',', expand=True).applymap(lambda x: np.nan if x is None else x)
df[['Order_Id_' str(i) for i in range(1,temp.shape[1]   1)]] = temp

           Mobile  ...               List_of_Order_Id Order_Id_1 Order_Id_2  \
0    9.163820e 08  ...                          21810      21810        NaN   
1    9.179049e 08  ...                          23387      23387        NaN   
2    9.183748e 08  ...                          21767      21767        NaN   
3    9.186110e 08  ...                          23457      23457        NaN   
4    9.187790e 08  ...                    23117,23163      23117      23163   
..            ...  ...                            ...        ...        NaN   
353  9.970647e 09  ...                          21549      21549        NaN   
354  9.971940e 09  ...                          22753      22753        NaN   
355  9.994742e 09  ...  21505,21836,22291,22539,22734      21505      21836   
356  9.994964e 09  ...                          22348      22348        NaN   
357  9.994997e 09  ...                    21100,21550      21100      21550   

    Order_Id_3 Order_Id_4 Order_Id_5  
0          NaN        NaN        NaN  
1          NaN        NaN        NaN  
2          NaN        NaN        NaN  
3          NaN        NaN        NaN  
4          NaN        NaN        NaN  
..         NaN        NaN        NaN  
353        NaN        NaN        NaN  
354        NaN        NaN        NaN  
355      22291      22539      22734  
356        NaN        NaN        NaN  
357        NaN        NaN        NaN  
  • Related