I have the following dataframe:
df = pd.DataFrame({
"name": ["Jones", "Jennifer", "Jack", "Sara", "Mick"],
"age": [44, 22, 33, 44, 55],
"weight": [44, 55, 66, 77, 99],
"male": [1, 0, 1, 0, 1],
"female": [0, 0, 0, 1, 0],
"prefer_not_to_respond": [0, 1, 0, 0, 0],
"height": [175, 173, 160, 178, 190],
"is_smoking": [True, False, False, True, False]})
How can I select the "male", "female" and "prefer_not_to_respond" columns and merge them into one column called "gender" that will hold the value of the gender in every row (and delete the "male", "female" and "prefer_not_to_respond" columns)? Would love a solution without using split method Thank
CodePudding user response:
Let us check dot
s = df[['male','female','prefer_not_to_respond']]
df['new'] = s.dot(s.columns)
df
Out[376]:
name age weight ... height is_smoking new
0 Jones 44 44 ... 175 True male
1 Jennifer 22 55 ... 173 False prefer_not_to_respond
2 Jack 33 66 ... 160 False male
3 Sara 44 77 ... 178 True female
4 Mick 55 99 ... 190 False male
CodePudding user response:
You can use idxmax
:
cols = ['male', 'female', 'prefer_not_to_respond']
df = df.assign(gender=df[cols].idxmax(axis=1)).drop(columns=cols)
print(df)
# Output
name age weight height is_smoking gender
0 Jones 44 44 175 True male
1 Jennifer 22 55 173 False prefer_not_to_respond
2 Jack 33 66 160 False male
3 Sara 44 77 178 True female
4 Mick 55 99 190 False male
CodePudding user response:
def f(x):
if x['male'] == 1:
return 'male'
elif x['female'] == 1:
return 'female'
elif x['prefer_not_to_respond'] == 1:
return 'prefer_not_to_respond'
df['gender'] = df.apply(f, axis=1)
df = df.drop(['male', 'female', 'prefer_not_to_respond'], axis=1)
CodePudding user response:
genders = ["male", "female", "prefer_not_to_respond"]
df = df.assign(gender=np.select(df[genders].eq(1).to_numpy().T, genders)).drop(columns=genders)