Home > OS >  Pandas separate list in row to columns and one hot encode
Pandas separate list in row to columns and one hot encode

Time:12-19

a client's online form outputs multi-choice selections to a single field separated by commas. is there a way to split out the values into columns and one hot encode them, I'm using pandas?

Data now:

id  | ind
1   | Student, Tourism, Industrial
2   | Residential, Student, Tourism
3   | student, Tourism, Industrial, Education

How I want the data:

id          | ind_student| ind_Tourism| ind_Industrial| ind_Residential|  ind_Education
1           | 1          | 1          | 1             | 0              | 0    
2           | 1          | 1          | 0             | 1              | 0
3           | 1          | 1          | 1             | 0              | 1

thanks for taking a look!

CodePudding user response:

Split your ind columns into words and pivot your dataframe:

out = df.assign(ind=df['ind'].str.split(', '), dummy=1).explode('ind') \
        .pivot_table('dummy', 'id', 'ind', fill_value=0).add_prefix('ind_')
print(out)

# Output:
ind  ind_Education  ind_Industrial  ind_Residential  ind_Student  ind_Tourism  ind_student
id                                                                                        
1                0               1                0            1            1            0
2                0               0                1            1            1            0
3                1               1                0            0            1            1

CodePudding user response:

Use get_dummies. However, first put each string in its row by creating list and exploding. Because there is mixed case, please capitalise the first letter and then crosstab

 df=df.assign(ind=df['ind'].str.title().str.strip().str.split(',')).explode('ind')

df=pd.crosstab(df['id'], df['ind']).add_prefix('ind_')



ind  ind_Education  ind_Industrial  ind_Residential  ind_Student  ind_Tourism
id                                                                           
1                0               1                0            1            1
2                0               0                1            1            1
3                1               1                0            1            1
  • Related