I have a dataset of hockey statistics and I want to apply a weight multiplier to certain statistics based on certain conditions.
A snippet of my dataset:
Player Pos GP G GF/GP S Shots/GP S% TOI/GP
2 Andrew Cogliano 1.0 79.2 11.0 0.1 126.8 1.6 8.3 14.44
12 Artturi Lehkonen 2.0 73.0 14.6 0.2 158.6 2.2 9.3 15.29
28 Cale Makar 4.0 59.3 16.0 0.3 155.0 2.6 9.8 23.67
31 Darren Helm 1.0 66.6 10.5 0.2 125.0 1.9 8.6 14.37
61 Gabriel Landeskog 2.0 72.0 24.3 0.3 196.1 2.7 12.8 19.46
103 Nathan MacKinnon 1.0 73.8 27.8 0.4 274.4 3.7 9.9 19.69
What I am trying to do is create a function that multiplies 'G', 'GF/GP', 'S', and 'Shots/GP' by a specific weight - 1.1 for example. But I want to only do that for players based on two categories:
- Defence ('Pos' = 4.0) with 50 or more games ('GP') and 20 min or more time on ice per game ('TOI/GP')
- Offense ('Pos' != 4.0) with 50 or more games ('GP') and 14 min or more time on ice per game ('TOI/GP')
I can identify these groups by:
def_cond = df.loc[(df["Pos"]==4.0) & (df["GP"]>=50) & (df["TOI/GP"] >=20.00)]
off_cond = df.loc[(df["Pos"]!=4.0) & (df["GP"]>=50) & (df["TOI/GP"] >=14.00)]
Output for def_cond:
Player Pos GP G GF/GP S Shots/GP S% TOI/GP
28 Cale Makar 4.0 59.3 16.0 0.3 155.0 2.6 9.8 23.67
41 Devon Toews 4.0 58.8 8.2 0.1 120.5 2.1 6.7 22.14
45 Erik Johnson 4.0 67.4 7.3 0.1 140.9 2.1 5.1 22.22
112 Samuel Girard 4.0 68.0 4.4 0.1 90.8 1.3 5.0 20.75
Issue:
What I want to do is take this output and multiply 'G', 'GF/GP', 'S', and 'Shots/GP' by a weight value - again 1.1 for example.
I have tried various things such as:
if def_cond == True:
df[["G", "GF/GP", S", "Shots/GP"]].multiply(1.1, axis="index")
Or simply
if def_cond == True:
df["G"] = (df["G"]*1.1)
Pretty much everything I try results in the following error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I am new to this so any and all advice is welcome!
CodePudding user response:
I would try this:
def f(df,weight):
for i in df.index:
if (
(df.loc[i,'Pos']==4.0 and df.loc[i,'GP']>=50
and df.loc[i,'TOI/GP']>=20)
or
(df.loc[i,'Pos']!=4.0 and df.loc[i,'GP']>=50
and df.loc[i,'TOI/GP']>=14)
):
df.loc[i,['G','GF/GP','S','Shots/GP']]*=weight
Though, i'm pretty sure, it is not the best solution...