Home > Blockchain >  Getting boolean columns based on value presence in other columns
Getting boolean columns based on value presence in other columns

Time:01-05

I have this table:

df1 = pd.DataFrame(data={'col1': ['a', 'e', 'a', 'e'], 
                         'col2': ['e', 'a', 'c', 'b'], 
                         'col3': ['c', 'b', 'b', 'a']},
                        index=pd.Series([1, 2, 3, 4], name='index'))
index col1 col2 col3
1 a e c
2 e a b
3 a c b
4 e b a

and this list:

all_vals = ['a', 'b', 'c', 'd', 'e' 'f']

How do I make boolean columns from df1 such that it includes all columns from the all_vals list, even if the value is not in df1?

index a b c d e f
1 TRUE FALSE TRUE FALSE TRUE FALSE
2 TRUE TRUE FALSE FALSE TRUE FALSE
3 TRUE TRUE TRUE FALSE FALSE FALSE
4 TRUE TRUE FALSE FALSE TRUE FALSE

CodePudding user response:

You can iterate over all_vals to check if the value exists and create new column

for val in all_vals:
    df1[val] = (df1 == val).any(axis=1)

CodePudding user response:

Use get_dummies with aggregate max per columns and DataFrame.reindex:

df1 = (pd.get_dummies(df1, dtype=bool, prefix='', prefix_sep='')
         .groupby(axis=1, level=0).max()
         .reindex(all_vals, axis=1, fill_value=False))
print (df1)
          a      b      c      d      e      f
index                                         
1      True  False   True  False   True  False
2      True   True  False  False   True  False
3      True   True   True  False  False  False
4      True   True  False  False   True  False
  • Related