Home > Software engineering >  Conver a list of features into a binary vector
Conver a list of features into a binary vector

Time:12-15

I have several lists of features:

feat_lists = [
  ['f1','f2','f3'],
  ['f2','f3'],
  ['f2','f4']
]

And I'd like to arrange them in a way that each row represents a list (observation), and each column a feature. So the values are 1/0 or True/False, depending on the presence of the value in that list (observation).

For instance, for the example above, I'd like to have the following dataframe (shown as a table)

f1 f2 f3 f4
1 True True True False
2 False True True False
3 False True False True

I can figure out a way to do it, but I imagine there must be a better and more efficient way to do it in pandas

thanks

CodePudding user response:

Use MultiLabelBinarizer with casting to boolean by DataFrame.astype:

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(feat_lists),columns=mlb.classes_).astype(bool)
print (df)
      f1    f2     f3     f4
0   True  True   True  False
1  False  True   True  False
2  False  True  False   True
  • Related