I would like to turn the names of columns into values. This is so to create a factor variable and define the levels as the column names. I am hoping to achieve x2
from x1
. In R it would be like using the model.matrix()
function
Thank you
x1 = pd.DataFrame({'A': [1,0,0],
'B': [0,1,0],
'C': [0,1,1]})
x2 = pd.DataFrame({'All': ['A','BC','C']})
CodePudding user response:
That's one way, there should be a simpler solution:
x1.astype(bool).apply(lambda row: ''.join(x1.columns[row]), axis=1)
CodePudding user response:
Use the @ (matrix multiplication operator) to multiply the columns vector by the boolean matrix:
import pandas as pd
x1 = pd.DataFrame({'A': [1, 0, 0],
'B': [0, 1, 0],
'C': [0, 1, 1]})
# create result DataFrame
x2 = pd.DataFrame({"all": x1 @ x1.columns})
print(x2)
Output
all
0 A
1 BC
2 C
CodePudding user response:
You can also use list comprehension, as follows:
cols = x1.columns.values
x2 = pd.DataFrame({'All': [''.join(cols[x]) for x in x1.eq(1).values]})
Or simply:
x2 = pd.DataFrame({'All': [''.join(x1.columns[x]) for x in x1.eq(1).values]})
Result:
print(x2)
All
0 A
1 BC
2 C