I have this:
df = pd.DataFrame({'my_col' : ['red', 'red', 'green']})
my_col
red
red
green
I want this: df2 = pd.DataFrame({'red' : [True, True, False], 'green' : [False, False, True]})
red green
True False
True False
False True
Is there an elegant way to do this?
CodePudding user response:
You can do this:
for color in df['my_col'].unique():
df[color] = df['my_col'] == color
df2 = df[df['my_col'].unique()]
It will loop over each color in my_col
and adds a column to df
with the name of the color and True/False whether it is equal to the color. Finally extract df2
from df
by selecting only the color columns.
Another option is to start with an empty dataframe for df2
and immediately add the columns to this dataframe:
df2 = pd.DataFrame()
for color in df['my_col'].unique():
df2[color] = df['my_col'] == color
Output:
red green
0 True False
1 True False
2 False True
CodePudding user response:
Python functionality get_dummies
can work for this.
import pandas as pd
import numpy as np
df = pd.DataFrame({'my_col': ['red', 'red', 'green']})
new_df = pd.get_dummies(df, dtype=bool)
new_df[:] = np.where(pd.get_dummies(df, dtype=bool), 'True', 'False')
new_df.rename(columns={'my_col_green': 'green', 'my_col_red': 'red'}, inplace=True)
print(new_df)
CodePudding user response:
# reset index, to keep the rows count
df=df.reset_index()
# create a cross tab (don't miss negation for the resultset)
~(pd.crosstab(index=[df['index'],df['my_col']],
columns=df['my_col'])
.reset_index() # cleanup to match the output
.drop(columns=['index','my_col']) # drop unwanted columns
.rename_axis(columns=None) # remove axis name
.astype(bool)) # make it boolean
green red
0 True False
1 True False
2 False True