using pandas make a string column into multiple columns with True/False-CodePudding

I have this:

df = pd.DataFrame({'my_col' : ['red', 'red', 'green']})

my_col
red
red
green

I want this: df2 = pd.DataFrame({'red' : [True, True, False], 'green' : [False, False, True]})

red  green
True  False
True  False
False   True

Is there an elegant way to do this?

CodePudding user response：

You can do this:

for color in df['my_col'].unique():
    df[color] = df['my_col'] == color

df2 = df[df['my_col'].unique()]

It will loop over each color in my_col and adds a column to df with the name of the color and True/False whether it is equal to the color. Finally extract df2 from df by selecting only the color columns.

Another option is to start with an empty dataframe for df2 and immediately add the columns to this dataframe:

df2 = pd.DataFrame()
for color in df['my_col'].unique():
    df2[color] = df['my_col'] == color

Output:

     red  green
0   True  False
1   True  False
2  False   True

CodePudding user response：

Python functionality get_dummies can work for this.

import pandas as pd
import numpy as np

df = pd.DataFrame({'my_col': ['red', 'red', 'green']})
new_df = pd.get_dummies(df, dtype=bool)
new_df[:] = np.where(pd.get_dummies(df, dtype=bool), 'True', 'False')

new_df.rename(columns={'my_col_green': 'green', 'my_col_red': 'red'}, inplace=True)
print(new_df)

CodePudding user response：

# reset index, to keep the rows count
df=df.reset_index()

# create a cross tab (don't miss negation for the resultset)
~(pd.crosstab(index=[df['index'],df['my_col']], 
             columns=df['my_col'])
 .reset_index()                  # cleanup to match the output
 .drop(columns=['index','my_col']) # drop unwanted columns
 .rename_axis(columns=None)        # remove axis name
 .astype(bool))                    # make it boolean

    green   red
0   True    False
1   True    False
2   False   True