I have the following dataframe:
data = \
{'id': [1, 2, 3, 4, 5],
'A': [1, 0, 0, 0, 0],
'B': [0, 0, 1, 0, 0],
'C': [0, 0, 0, 0, 1],
'D': [0, 1, 0, 0, 0],
'E': [0, 0, 0, 1, 0]}
df = pd.DataFrame(data)
So I want to create a new column, class
that takes a 0
if A is true (A=1
), a 1
if B is true (B=1
), a 2
if C is true, and so on.
Expected output:
id A B C D E class
0 1 1 0 0 0 0 0
1 2 0 0 0 1 0 3
2 3 0 1 0 0 0 1
3 4 0 0 0 0 1 4
4 5 0 0 1 0 0 2
CodePudding user response:
You can use np.nonzero
, which returns a tuple with the indices of the elements that are non-zero, and select the second element.
df['class'] = np.nonzero(df.iloc[:,1:].to_numpy())[1]
print(df)
id A B C D E class
0 1 1 0 0 0 0 0
1 2 0 0 0 1 0 3
2 3 0 1 0 0 0 1
3 4 0 0 0 0 1 4
4 5 0 0 1 0 0 2
Or np.where
and avoid the need for df.to_numpy
.
np.where(df.iloc[:,1:].eq(1))[1]
CodePudding user response:
df['class'] = df.apply(lambda x: x.B x.C*2 x.D*3 x.E*4, axis=1)
print(df)
Prints:
id A B C D E class
0 1 1 0 0 0 0 0
1 2 0 0 0 1 0 3
2 3 0 1 0 0 0 1
3 4 0 0 0 0 1 4
4 5 0 0 1 0 0 2