I have a table like this
id | non-binary | female | male | others |
---|---|---|---|---|
1 | True | False | False | False |
2 | False | True | False | False |
3 | False | False | True | False |
4 | True | True | True | False |
I want to combine columns into row like this -
If multiple columns have value True, only use the first column that has value True. Eg, for id=4, since the first True is in column "non-binary", then in the new table, set the gender value to non-binary.
How can it be converted to the table below by using Pandas?
id | Gender |
---|---|
1 | non-binary |
2 | female |
3 | male |
4 | non-binary |
CodePudding user response:
Since boolean
values are numeric (True == 1, False == 0) you can use idxmax
to get the index of the maximum value. This interestingly will correspond to the name of the column where the first True appears if you apply this along the column axis.
# set your index to the id column if it is not already
df = df.set_index('id')
out = df.idxmax(axis='columns').rename('Gender')
print(out)
id
1 non-binary
2 female
3 male
4 non-binary
Name: Gender, dtype: object
# Can use .reset_index() or .to_frame() to convert back to a DataFrame
# (depending on what you want to do with your Index)
print(out.reset_index())
id Gender
0 1 non-binary
1 2 female
2 3 male
3 4 non-binary
CodePudding user response:
I haven't tested this. But it seems in pandas version 1.5.0 there is a new function called from_dummies
:
Try this:
pd.from_dummies(df.astype(int))
CodePudding user response:
What about just using .loc
:
df2 = pd.DataFrame({'id': df.id})
df2.loc[df['non-binary'].eq(True), 'Gender'] = 'non-binary'
df2.loc[df['non-binary'].eq(False) & df['female'].eq(True), 'Gender'] = 'female'
df2.loc[df['non-binary'].eq(False) & df['male'].eq(True), 'Gender'] = 'male'
df2.loc[df['non-binary'].eq(False) & df['others'].eq(True), 'Gender'] = 'other'
print(df2)
Output:
id | Gender | |
---|---|---|
0 | 1 | non-binary |
1 | 2 | female |
2 | 3 | male |
3 | 4 | non-binary |