Home > Software engineering >  get columns from string separated by comma, considering the values for each row
get columns from string separated by comma, considering the values for each row

Time:06-04

I would like to transform a string separated by comma or list to columns like a show you below

column1 | column2 |    column3   |
A       |    X    | [1, 2, 3]    |    
B       |    Y    | [1, 4, 5, 8] |   
C       |    Z    |  [5, 1]      |   
D       |    T    |    [4, 3]    | 

or

column1 | column2 |  column3  |
A       |    X    |   1,2,3   |    
B       |    Y    |  1,4,5,8  | 
C       |    Z    |    5,1    |   
D       |    T    |    4,3    |

to

column1 | column2 |  1  |  2  |  3  |  4  |  5  |  8  |
A       |    X    |  1  |  1  |  1  |  0  |  0  |  0  |  
B       |    Y    |  1  |  0  |  0  |  1  |  1  |  1  |
C       |    Z    |  1  |  0  |  0  |  0  |  1  |  0  |
D       |    T    |  0  |  0  |  1  |  1  |  0  |  0  |

Thanks in advance

CodePudding user response:

In the second case, you can use Series.str.get_dummies to easily one-hot encode the column3 column. Then use Dataframe.join to join the resulting one-hot encoded columns to the original DataFrame.

import pandas as pd 

df = pd.DataFrame({
 'column1': ['A', 'B', 'C', 'D'],
 'column2': ['X', 'Y', 'Z', 'T'],
 'column3': ['1,2,3', '1,4,5,8', '5,1', '4,3']
})

res = (
    df.drop(columns='column3')
      .join(df['column3'].str.get_dummies(sep=','))
)

Output:

>>> res 

  column1 column2  1  2  3  4  5  8
0       A       X  1  1  1  0  0  0
1       B       Y  1  0  0  1  1  1
2       C       Z  1  0  0  0  1  0
3       D       T  0  0  1  1  0  0

In the first case, assuming that column3 contains lists of integers, you can do the same thing as above, but first convert the values of column3 to comma-separated strings. For instance using Series.apply

df['column3'] = df['column3'].apply(lambda lst_nums: ','.join(map(str, lst_nums)))

If the values of column3 are lists of strings you can use Series.str.join instead

df['column3'] = df['column3'].str.join(',')
  • Related