Home > database >  How to create a column that evaluates the output in 3 other columns in a pandas data frame?
How to create a column that evaluates the output in 3 other columns in a pandas data frame?

Time:11-13

I've the following data frame (df).

GovKeepSecure BankKeepSecure OtherKeepSecure Secure
Yes Yes Yes Yes
No No Yes No
No Neutral Yes Neutral

I'm looking to write a python function that evaluates the first 3 columns, and returns the value that occurs more than 2 times in the "Secure"/4th column.

For example, if there's 2 or more of "No"s in the first 3 columns (in the same row), than the value in the "Secure" column results in "No." If such a condition isn't fulfilled, then the "Secure" column defaults to "Neutral."

I was wondering how we'd go about creating such a function.

Here's the approach I'm trying to develop.

import pandas as pd

def secure(row):
    if row["GovKeepSecure", "BankKeepSecure", OtherKeepSecure] == ["Yes", "Yes", "Yes"]:
             return "Yes"
    if row["GovKeepSecure", "BankKeepSecure", OtherKeepSecure] == ["Yes", "Yes", "No"]:
             return "Yes"
-------------------------------------------------------------------------------------(etc.)
df["Secure"] = df.apply(lambda row: secure(row), axis=1)

Do let me know if there's a better way. Thanks so much!

CodePudding user response:

You can use np.select for that

a = df[['GovKeepSecure', 'BankKeepSecure', 'OtherKeepSecure']]

yes_counts = a.eq('Yes').sum(1)
no_counts  = a.eq('No').sum(1)
df['Secure'] = np.select([yes_counts > no_counts,
                          yes_counts < no_counts],
                         ['Yes', 'No'],
                         default='Neutral')

You can also use mode

a.agg(lambda s: s.mode() if len(s.mode()) == 1 else 'Neutral', axis=1)

CodePudding user response:

here is one way to do it

# apply value_counts and take idxmax along a row
# sort values, as when value count are equal, it takes the first value in the series
# sorting helps makes neutral comes as first result

df['Secure']=df.apply(lambda x: x.sort_values().value_counts().idxmax() , axis=1)
df
    GovKeepSecure   BankKeepSecure  OtherKeepSecure      Secure
0             Yes              Yes              Yes         Yes
1              No               No              Yes          No
2              No          Neutral              Yes     Neutral

alternately,

# when multiple values has the same mode, take from the first column, 
# which is also sort and returns neutral

df['Secure']=df.apply(lambda x: x.mode()[0], axis=1)

  • Related