I have a function I'm trying to apply to a dataframe 'players'.
'players' is taken from a larger dataframe 'df_tot' using .groupby():
players = df_tot.groupby(["Player", "Year"]).get_group(("Derrick White", 2022))
players returns the following dataframe:
Unnamed: 0 | Rk | Player | Pos | Age | Tm | G | GS | MP | Year | |
---|---|---|---|---|---|---|---|---|---|---|
17263 | 776 | 578 | Derrick White | SG | 27 | TOT | 75 | 52 | 2199 | 2022 |
17264 | 777 | 578 | Derrick White | SG | 27 | SAS | 49 | 48 | 1486 | 2022 |
17265 | 778 | 578 | Derrick White | SG | 27 | BOS | 26 | 4 | 713 | 2022 |
I want to return the row where 'Tm' = 'TOT'.
Here is the function I'm trying to apply to 'players':
def clean_traded(df):
if df.shape[0]==1:
return df
else:
row = df[df["Tm"]=="TOT"]
row["Tm"] = df.iloc[-1,:]["Tm"]
return row
players.apply(clean_traded)
However, I recieve:
KeyError: 'Tm'
How is this possible when 'Tm' is clearly a column in 'players' dataframe? For example the below code:
print(players.columns.tolist())
Returns:
['Unnamed: 0', 'Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'Year']
Any ideas? The below code doesn't produce an error, but I want to know why adding a "get_group" produces an error.
players = df_tot.groupby(["Player", "Year"]).apply(clean_traded)
Thank you!
CodePudding user response:
I'm not sure what you're goal is, is there a reason you write this function and apply it instead of just filter the row with your condition like this:
res = players.loc[players['Tm']=='TOT',:]
print(res)
Unnamed: 0 Rk Player Pos Age Tm G GS MP Year
17263 776 578 Derrick White SG 27 TOT 75 52 2199 2022
If you want to debug your function or get a better understanding of what is happening there, just put some print statements in it. For example print(players)
at the beginning of your else
statement. Then you will see why you get a KeyError
.
CodePudding user response:
apply
by default is applied to columns. And when it is applied, the columns are turned into Series. So when you do players.apply(clean_traded)
, it first looks at Unnamed: 0
, turns that into a Series, and then tries to apply clean_traded
to that. This Series has the row names as labels, but the column name 'Unnamed: 0'
is not part of the Series. So this returns an error, because there is no Tm
column in the Series.
You should just do clean_traded(players)
. And there's almost certainly a simpler way of doing what you're traying to do.