I would like to transform a string separated by comma or list to columns like a show you below
column1 | column2 | column3 |
A | X | [1, 2, 3] |
B | Y | [1, 4, 5, 8] |
C | Z | [5, 1] |
D | T | [4, 3] |
or
column1 | column2 | column3 |
A | X | 1,2,3 |
B | Y | 1,4,5,8 |
C | Z | 5,1 |
D | T | 4,3 |
to
column1 | column2 | 1 | 2 | 3 | 4 | 5 | 8 |
A | X | 1 | 1 | 1 | 0 | 0 | 0 |
B | Y | 1 | 0 | 0 | 1 | 1 | 1 |
C | Z | 1 | 0 | 0 | 0 | 1 | 0 |
D | T | 0 | 0 | 1 | 1 | 0 | 0 |
Thanks in advance
CodePudding user response:
In the second case, you can use Series.str.get_dummies
to easily one-hot encode the column3
column.
Then use Dataframe.join
to join the resulting one-hot encoded columns to the original DataFrame.
import pandas as pd
df = pd.DataFrame({
'column1': ['A', 'B', 'C', 'D'],
'column2': ['X', 'Y', 'Z', 'T'],
'column3': ['1,2,3', '1,4,5,8', '5,1', '4,3']
})
res = (
df.drop(columns='column3')
.join(df['column3'].str.get_dummies(sep=','))
)
Output:
>>> res
column1 column2 1 2 3 4 5 8
0 A X 1 1 1 0 0 0
1 B Y 1 0 0 1 1 1
2 C Z 1 0 0 0 1 0
3 D T 0 0 1 1 0 0
In the first case, assuming that column3
contains lists of integers, you can do the same thing as above, but first convert the values of column3
to comma-separated strings. For instance using Series.apply
df['column3'] = df['column3'].apply(lambda lst_nums: ','.join(map(str, lst_nums)))
If the values of column3
are lists of strings you can use Series.str.join
instead
df['column3'] = df['column3'].str.join(',')