Home > OS >  KeyError(key) in get_loc_level after using .transform() or apply()
KeyError(key) in get_loc_level after using .transform() or apply()

Time:01-18

I have a large grouped data frame with multiple groups where I'm trying to filter rows within each group. To simplify it, I will share a simplified data frame with one group where I'm getting the error. df5 is grouped by "Detail", "ID", "Year"

data2 = {"Year":["2012","2012","2012","2012","2012","2012","2012","2012","2012"],
        "Country":['USA','USA','USA','USA','USA','USA','USA','CANADA',"CANADA"],
         "Country_2": ["", "", "", "", "", "", "", "USA", "USA"],
        "ID":["AF12","A15","BU14","DU157","L12","N10","RU156","DU157","RU156"],
         "Detail":[1,1,1,1,1,1,1,1,1],
         "Second_country_available":[False,False,False,False,False,False,False,True,True],
      
        }
df5 = pd.DataFrame(data2)
df5_true = df5["Second_country_available"] == True
Country_2_gr = df5[df5_true].groupby(["Detail", "ID", "Year"])['Country_2'].agg(
            '|'.join)
Country_2_gr
grouped_df5 = (df5.groupby(["Detail", "ID", "Year"], group_keys=False)['Country'])
filtered = grouped_df5.transform(lambda g: g.str.fullmatch(Country_2_gr[g.name]))
filtered

The error would be:

return (self._engine.get_loc(key), None)
  File "pandas\_libs\index.pyx", line 774, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: (1, 'A15', '2012')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "packages\pandas\core\indexes\.py", line 3045, in _get_loc_level
    raise KeyError(key) from err
KeyError: (1, 'A15', '2012')

The code is working for most of the cases, so I don't want to radically change it. I would like to have a fix where in a similar case to the one I showed, the rows would be dropped.

CodePudding user response:

Country_2_gr is based on filtered dataframe, so it will not have all the keys, you can try switching to get with default:

filtered = grouped_df5.transform(lambda g: g.str.fullmatch(Country_2_gr.get(g.name, default="")))
filtered
  • Related