Creating a new conditional column in panda dataframe-CodePudding

I am trying to determine the result of a football match based on the scored points. If the amount of goals and scored and received are equal the expected output should be a draw. if the amount of scored goals is higher then the goals received is then the expected output should be a win. If the amount of the goals scored are lower the goals received are the same the output should be lost.

Football_data_match['result'] = if(Football_data_match['goal_scored'] > Football_data_match['goal_against']:
                                   Football_data_match['result'] = 'win'
                                elif (Football_data_match['goal_scored'<Football_data_match['goal_against']:
                                    Football_data_match['result'] 'lost'
                                else:
                                    Football_data_match['result']  = 'draw')

The code above gives a syntax error but I'm not able to pinpoint the exact mistake. Could somebody help me fix this problem.

CodePudding user response：

One way is using np.select:

import numpy as np
import pandas as pd

# Example data
df = pd.DataFrame({
    "goal_scored": np.random.randint(4, size=12),
    "goal_against": np.random.randint(4, size=12)
})

df["result"] = np.select(
    [
        df["goal_scored"] < df["goal_against"],
        df["goal_scored"] == df["goal_against"],
        df["goal_scored"] > df["goal_against"]
    ], ["lost", "draw", "win"]
)

df:

    goal_scored  goal_against result
0             1             3   lost
1             0             1   lost
2             0             3   lost
3             3             2    win
4             1             3   lost
5             2             0    win
6             2             2   draw
7             2             2   draw
8             3             1    win
9             0             2   lost
10            2             3   lost
11            1             1   draw

CodePudding user response：

You can also use DataFrame.apply:

import pandas as pd
import numpy as np
import itertools
teams = ['Arizona Cardinals', 'Atlanta Falcons', 'Baltimore Ravens', 'Buffalo Bills', 'Carolina Panthers', 'Chicago Bears']
k = pd.DataFrame(np.random.randint(20,high=30, size=(15,2)), index=itertools.combinations(teams, 2), columns=['goal_scored', 'goal_against'])

k['result'] = k.apply(lambda row: 'win' if row['goal_scored'] > row['goal_against'] else ('lost' if row['goal_scored'] < row['goal_against'] else 'draw'), axis=1)

k is:

                                       goal_scored  goal_against result
(Arizona Cardinals, Atlanta Falcons)             29            29   draw
(Arizona Cardinals, Baltimore Ravens)            20            26   lost
(Arizona Cardinals, Buffalo Bills)               21            24   lost
(Arizona Cardinals, Carolina Panthers)           20            25   lost
(Arizona Cardinals, Chicago Bears)               27            28   lost
(Atlanta Falcons, Baltimore Ravens)              26            24    win
(Atlanta Falcons, Buffalo Bills)                 20            21   lost
(Atlanta Falcons, Carolina Panthers)             22            25   lost
(Atlanta Falcons, Chicago Bears)                 26            22    win
(Baltimore Ravens, Buffalo Bills)                23            21    win
(Baltimore Ravens, Carolina Panthers)            29            22    win
(Baltimore Ravens, Chicago Bears)                21            27   lost
(Buffalo Bills, Carolina Panthers)               24            21    win
(Buffalo Bills, Chicago Bears)                   28            26    win
(Carolina Panthers, Chicago Bears)               24            22    win

Your problem is that you need to think vectorized when using pandas. Your if...else... operates on scalars, when Football_data_match is a whole DataFrame.

You need to start with the DataFrame or numpy.ndarray.