I have this data
ID A B C
0 0 True False False
1 1 False True False
2 2 False False True
And want to transform it into
ID group
0 0 A
1 1 B
2 2 C
- I want to use the column names as value labels for the
category
column. - There is maximum only one
True
value per row.
This is the MWE
#!/usr/bin/env python3
import pandas as pd
df = pd.DataFrame({
'ID': range(3),
'A': [True, False, False],
'B': [False, True, False],
'C': [False, False, True]
})
result = pd.DataFrame({
'ID': range(3),
'group': ['A', 'B', 'C']
})
result.group = result.group.astype('category')
print(df)
print(result)
I could do df.apply(lambda row: ...magic.., axis=1)
. But isn't there a more elegant way with pandas own tools?
CodePudding user response:
Yet another way:
df.set_index(['ID'])\
.rename_axis('group', axis=1)\ # getting column name correct
.stack()\ # reshaping getting column headers into dataframe rows
.loc[lambda x: x]\ # filtering for True
.reset_index()\ # moving ID back into dataframe columns
.drop(0, axis=1) # dropping boolean column
Output:
ID group
0 0 A
1 1 B
2 2 C
CodePudding user response:
You can use df.dot
:
df['group'] = df[['A', 'B', 'C']].dot(df.columns[1:])
CodePudding user response:
You could use pd.melt()
to reshape and rename, and then drop the False
rows in your 'value' column using query
which allows for a chained operation:
pd.melt(df,id_vars=['ID'],var_name= 'group').query('value') \
.drop('value',axis=1).reset_index(drop=True)
ID group
0 0 A
1 1 B
2 2 C
CodePudding user response:
You can use melt then a lookup based on the column where the values are true to get the results you are expecting
df = df.melt(id_vars = 'ID', var_name = 'group')
df.loc[df['value'] == True][['ID', 'group']]
CodePudding user response:
More verbose than melt
, but this drops the invalid columns during the reshaping:
(df.set_index('ID')
.rename_axis(columns='group')
.replace(False, pd.NA)
.stack().reset_index().drop(columns=0)
)
output:
ID group
0 0 A
1 1 B
2 2 C
CodePudding user response:
Try with apply
lambda
df.set_index('ID').apply(lambda x : x.index[x][0],axis=1)
Out[39]:
ID
0 A
1 B
2 C
dtype: object
CodePudding user response:
idxmax
s = df.set_index('ID')
s.idxmax(1).where(s.any(1))
ID
0 A
1 B
2 C
dtype: object