I have this table:
df1 = pd.DataFrame(data={'col1': ['a', 'e', 'a', 'e'],
'col2': ['e', 'a', 'c', 'b'],
'col3': ['c', 'b', 'b', 'a']},
index=pd.Series([1, 2, 3, 4], name='index'))
index | col1 | col2 | col3 |
---|---|---|---|
1 | a | e | c |
2 | e | a | b |
3 | a | c | b |
4 | e | b | a |
and this list:
all_vals = ['a', 'b', 'c', 'd', 'e' 'f']
How do I make boolean columns from df1
such that it includes all columns from the all_vals
list, even if the value is not in df1
?
index | a | b | c | d | e | f |
---|---|---|---|---|---|---|
1 | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
2 | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE |
3 | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE |
4 | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE |
CodePudding user response:
You can iterate over all_vals to check if the value exists and create new column
for val in all_vals:
df1[val] = (df1 == val).any(axis=1)
CodePudding user response:
Use get_dummies
with aggregate max per columns and DataFrame.reindex
:
df1 = (pd.get_dummies(df1, dtype=bool, prefix='', prefix_sep='')
.groupby(axis=1, level=0).max()
.reindex(all_vals, axis=1, fill_value=False))
print (df1)
a b c d e f
index
1 True False True False True False
2 True True False False True False
3 True True True False False False
4 True True False False True False