I've the following data frame (df).
GovKeepSecure | BankKeepSecure | OtherKeepSecure | Secure |
---|---|---|---|
Yes | Yes | Yes | Yes |
No | No | Yes | No |
No | Neutral | Yes | Neutral |
I'm looking to write a python function that evaluates the first 3 columns, and returns the value that occurs more than 2 times in the "Secure"/4th column.
For example, if there's 2 or more of "No"s in the first 3 columns (in the same row), than the value in the "Secure" column results in "No." If such a condition isn't fulfilled, then the "Secure" column defaults to "Neutral."
I was wondering how we'd go about creating such a function.
Here's the approach I'm trying to develop.
import pandas as pd
def secure(row):
if row["GovKeepSecure", "BankKeepSecure", OtherKeepSecure] == ["Yes", "Yes", "Yes"]:
return "Yes"
if row["GovKeepSecure", "BankKeepSecure", OtherKeepSecure] == ["Yes", "Yes", "No"]:
return "Yes"
-------------------------------------------------------------------------------------(etc.)
df["Secure"] = df.apply(lambda row: secure(row), axis=1)
Do let me know if there's a better way. Thanks so much!
CodePudding user response:
You can use np.select
for that
a = df[['GovKeepSecure', 'BankKeepSecure', 'OtherKeepSecure']]
yes_counts = a.eq('Yes').sum(1)
no_counts = a.eq('No').sum(1)
df['Secure'] = np.select([yes_counts > no_counts,
yes_counts < no_counts],
['Yes', 'No'],
default='Neutral')
You can also use mode
a.agg(lambda s: s.mode() if len(s.mode()) == 1 else 'Neutral', axis=1)
CodePudding user response:
here is one way to do it
# apply value_counts and take idxmax along a row
# sort values, as when value count are equal, it takes the first value in the series
# sorting helps makes neutral comes as first result
df['Secure']=df.apply(lambda x: x.sort_values().value_counts().idxmax() , axis=1)
df
GovKeepSecure BankKeepSecure OtherKeepSecure Secure
0 Yes Yes Yes Yes
1 No No Yes No
2 No Neutral Yes Neutral
alternately,
# when multiple values has the same mode, take from the first column,
# which is also sort and returns neutral
df['Secure']=df.apply(lambda x: x.mode()[0], axis=1)