Home > other >  Handling features with multiple values per instance in Python for Machine Learning model
Handling features with multiple values per instance in Python for Machine Learning model

Time:04-29

I am trying to handle my data set which contain some features that has some multiple values per instances as shown on the image
https://i.stack.imgur.com/D78el.png
I am trying to separate each value by '|' symbol to apply One-Hot encoding technique but I can't find any suitable solution to my problem
My idea is to keep every multiple values in one row or by another word convert each cell to list of integers

CodePudding user response:

Maybe this is what you want:

df = pd.DataFrame(['465','444','465','864|857|850|843'],columns=['genre_ids'])
df

         genre_ids
0              465
1              444
2              465
3  864|857|850|843

df['genre_ids'].str.get_dummies(sep='|')

   444  465  843  850  857  864
0    0    1    0    0    0    0
1    1    0    0    0    0    0
2    0    1    0    0    0    0
3    0    0    1    1    1    1
  • Related