Getting boolean columns based on value presence in other columns-CodePudding

I have this table:

df1 = pd.DataFrame(data={'col1': ['a', 'e', 'a', 'e'], 
                         'col2': ['e', 'a', 'c', 'b'], 
                         'col3': ['c', 'b', 'b', 'a']},
                        index=pd.Series([1, 2, 3, 4], name='index'))

index	col1	col2	col3
1	a	e	c
2	e	a	b
3	a	c	b
4	e	b	a

and this list:

all_vals = ['a', 'b', 'c', 'd', 'e' 'f']

How do I make boolean columns from df1 such that it includes all columns from the all_vals list, even if the value is not in df1?

index	a	b	c	d	e	f
1	TRUE	FALSE	TRUE	FALSE	TRUE	FALSE
2	TRUE	TRUE	FALSE	FALSE	TRUE	FALSE
3	TRUE	TRUE	TRUE	FALSE	FALSE	FALSE
4	TRUE	TRUE	FALSE	FALSE	TRUE	FALSE

CodePudding user response：

You can iterate over all_vals to check if the value exists and create new column

for val in all_vals:
    df1[val] = (df1 == val).any(axis=1)

CodePudding user response：

Use get_dummies with aggregate max per columns and DataFrame.reindex:

df1 = (pd.get_dummies(df1, dtype=bool, prefix='', prefix_sep='')
         .groupby(axis=1, level=0).max()
         .reindex(all_vals, axis=1, fill_value=False))
print (df1)
          a      b      c      d      e      f
index                                         
1      True  False   True  False   True  False
2      True   True  False  False   True  False
3      True   True   True  False  False  False
4      True   True  False  False   True  False