Home > Net >  Pandas - Key Error when key clearly exists in dataframe
Pandas - Key Error when key clearly exists in dataframe

Time:05-30

I have a function I'm trying to apply to a dataframe 'players'.

'players' is taken from a larger dataframe 'df_tot' using .groupby():

players = df_tot.groupby(["Player", "Year"]).get_group(("Derrick White", 2022))

players returns the following dataframe:

Unnamed: 0 Rk Player Pos Age Tm G GS MP Year
17263 776 578 Derrick White SG 27 TOT 75 52 2199 2022
17264 777 578 Derrick White SG 27 SAS 49 48 1486 2022
17265 778 578 Derrick White SG 27 BOS 26 4 713 2022

I want to return the row where 'Tm' = 'TOT'.

Here is the function I'm trying to apply to 'players':

def clean_traded(df):
    if df.shape[0]==1:
        return df
    else:
        row = df[df["Tm"]=="TOT"]
        row["Tm"] = df.iloc[-1,:]["Tm"]
        return row

players.apply(clean_traded)

However, I recieve:

 KeyError: 'Tm'

How is this possible when 'Tm' is clearly a column in 'players' dataframe? For example the below code:

print(players.columns.tolist())

Returns:

['Unnamed: 0', 'Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'Year']

Any ideas? The below code doesn't produce an error, but I want to know why adding a "get_group" produces an error.

players = df_tot.groupby(["Player", "Year"]).apply(clean_traded)

Thank you!

CodePudding user response:

I'm not sure what you're goal is, is there a reason you write this function and apply it instead of just filter the row with your condition like this:

res = players.loc[players['Tm']=='TOT',:]

print(res)
       Unnamed: 0   Rk         Player Pos  Age   Tm   G  GS    MP  Year
17263         776  578  Derrick White  SG   27  TOT  75  52  2199  2022

If you want to debug your function or get a better understanding of what is happening there, just put some print statements in it. For example print(players) at the beginning of your else statement. Then you will see why you get a KeyError.

CodePudding user response:

apply by default is applied to columns. And when it is applied, the columns are turned into Series. So when you do players.apply(clean_traded), it first looks at Unnamed: 0, turns that into a Series, and then tries to apply clean_traded to that. This Series has the row names as labels, but the column name 'Unnamed: 0' is not part of the Series. So this returns an error, because there is no Tm column in the Series.

You should just do clean_traded(players). And there's almost certainly a simpler way of doing what you're traying to do.

  • Related