I have several lists of features:
feat_lists = [
['f1','f2','f3'],
['f2','f3'],
['f2','f4']
]
And I'd like to arrange them in a way that each row represents a list (observation), and each column a feature. So the values are 1/0 or True/False, depending on the presence of the value in that list (observation).
For instance, for the example above, I'd like to have the following dataframe (shown as a table)
f1 | f2 | f3 | f4 | |
---|---|---|---|---|
1 | True | True | True | False |
2 | False | True | True | False |
3 | False | True | False | True |
I can figure out a way to do it, but I imagine there must be a better and more efficient way to do it in pandas
thanks
CodePudding user response:
Use MultiLabelBinarizer
with casting to boolean by DataFrame.astype
:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(feat_lists),columns=mlb.classes_).astype(bool)
print (df)
f1 f2 f3 f4
0 True True True False
1 False True True False
2 False True False True