Home > Back-end >  Applying multiple conditions to dataframe columns
Applying multiple conditions to dataframe columns

Time:09-21

I'm working with a pandas dataframe of a team's results:

    Team Home/Away  Home_Score   Away_Score
0   ABC   Home          2            3
1   ABC   Home          1            2
2   ABC   Away          1            3
3   ABC   Away          0            1

I want to create a new column called 'Result' which returns Win, Loss or Draw based on the results above and whether the team in question played home or away. I'm trying to use where() function from numpy within a function but it's not applying the numpy part, only the first part which checks if the team is Home or Away. Below is my function and lambda statement:


def result(x):    
    for score in df['Home/Away']:
        
        #Home Wins
        if x == 'Home' and np.where(df['Home_Score'] > df['Away_Score']):
            return 'Win'
        
        #Home Losses
        elif x == 'Home' and np.where(df['Home_Score'] < df['Away_Score']):
            return 'Loss'

        #Away Wins
        elif x == 'Away' and np.where(df['Home_Score'] < df['Away_Score']):
            return 'Win'
        
        #Away Losses
        elif x == 'Away' and np.where(df['Home_Score'] > df['Away_Score']):
            return 'Loss'
        
        #Draws
        elif np.where(df['Home_Score'] == df['Away_Score']):
            return 'Draw'
        
df['Result'] = df.apply(lambda x: result(x['Home/Away']), axis=1)

I'm not sure how to get it to read Home_Score and Away_Score columns too and apply the np.where function - I thought it should be sufficient to add them into the if statements but it's not working. For example the above code returns Win, Win, Win, Win when my expected output for Result is Loss, Loss, Win, Win. Any help would be appreciated.

CodePudding user response:

I personally would use np.select() gives you a little more control and readability

condition_list = [
    (df['Home/Away'] == 'Home') & (df['Home_Score'] > df['Away_Score']),
    (df['Home/Away'] == 'Home') & (df['Home_Score'] < df['Away_Score']),
    (df['Home/Away'] == 'Away') & (df['Home_Score'] < df['Away_Score']),
    (df['Home/Away'] == 'Away') & (df['Home_Score'] > df['Away_Score']),
]

choice_list = [
    'Win',
    'Lose',
    'Win',
    'Lose'
]

df['Results'] = np.select(condition_list, choice_list, 'Draw')
df

CodePudding user response:

A more straightforward approach might be:

  • compute the sign of the difference of scores
  • multiply by -1 if Away
  • if -1 -> Loss, if 0 -> Draw, if 1 -> Win
df['Result'] = (
 np.sign(df['Home_Score'].sub(df['Away_Score']))
   .mul(df['Home/Away'].map({'Home': 1, 'Away': -1}))
   .map({1: 'Win', 0: 'Draw', -1: 'Loss'})
 )

Alternatively, if you want to use numpy.select, you can simplify the logic to 2 conditions:

  • if the scores are equal -> Draw
  • if the boolean Home_Score>Away_Score equals the boolean Home/Away == Home -> Win
    • Home and Home_Score>Away_Score -> Win
    • not Home and not Home_Score>Away_Score -> also Win
  • else Loss
c1 = df['Home_Score'].eq(df['Away_Score'])
c2 = df['Home/Away'].eq('Home')
c3 = df['Home_Score'].gt(df['Away_Score'])
df['Result'] = np.select([c1, c2==c3], ['Draw', 'Win'], 'Loss')

Output:

  Team Home/Away  Home_Score  Away_Score Result
0  ABC      Home           2           3   Loss
1  ABC      Home           1           2   Loss
2  ABC      Away           1           3    Win
3  ABC      Away           0           1    Win

Other example to show all possibilities:

  Team Home/Away  Home_Score  Away_Score Result
0  ABC      Home           2           3   Loss
1  ABC      Home           5           2    Win
2  ABC      Away           1           3    Win
3  ABC      Away           2           1   Loss
4  ABC      Home           2           2   Draw
5  ABC      Away           1           1   Draw
  • Related