Home > database >  Dividing Pandas column based on year
Dividing Pandas column based on year

Time:12-16

I have a pandas dataframe of stats from 4 NBA seasons where seasons starts from 2017-18 and has been converted into dummy variables as seen below.

                           Salary      VORP  ...  Season_2019-20  Season_2020-21
Player                                       ...                                
Nikola Jokić           29542010.0  0.931373  ...               0               1
James Harden           28299399.0  0.843137  ...               0               0
James Harden           30570000.0  1.000000  ...               0               0
Giannis Antetokounmpo  24157304.0  0.813725  ...               0               0
Rudy Gobert            23491573.0  0.558824  ...               0               0

I want to divide the salary column by the year by that year's salary cap using the function below.

def pct_cap(row):
    if row['Season_2017-18'] == 1:
        return final_data['Salary'] / 99093000
    if row['Season_2018-19'] == 1:
        return final_data['Salary'] / 101869000
    if row['Season_2019-20'] == 1:
        return final_data['Salary'] / 109140000
    if row['Season_2020-21'] == 1:
        return final_data['Salary'] / 109140000
    return 1

However, when I apply the function using the code below, it completely changes the shape of the dataframe as it appears to be applying the function to every column instead of just the Salary column.

What is the logic that is occurring with this function and what would be the best way to divide the salary by the salary cap? I'm a beginner and any help would be greatly appreciated.

x = final_data.apply(lambda row: pct_cap(row), axis=1)

Player                 Nikola Jokić  James Harden  ...  Alec Burks  Vince Carter
Player                                             ...                          
Nikola Jokić               0.270680      0.259294  ...    0.099372      0.021934
James Harden               0.298124      0.285584  ...    0.109448      0.024158
James Harden               0.290000      0.277802  ...    0.106465      0.023500
Giannis Antetokounmpo      0.290000      0.277802  ...    0.106465      0.023500
Rudy Gobert                0.290000      0.277802  ...    0.106465      0.023500

CodePudding user response:

Your pct_cap function is weird. The problem is, for every row, it's returning a series instead of a number. It should return the salary of a player, not salaries of everyone.

Try it like this:

def pct_cap(row):
    if row['Season_2017-18'] == 1:
        return row['Salary'] / 99093000
    if row['Season_2018-19'] == 1:
        return row['Salary'] / 101869000
    if row['Season_2019-20'] == 1:
        return row['Salary'] / 109140000
    if row['Season_2020-21'] == 1:
        return row['Salary'] / 109140000
    return 1

x = final_data.apply(pct_cap, axis=1)
  • Related