I have a large grouped data frame with multiple groups where I'm trying to filter rows within each group. To simplify it, I will share a simplified data frame with one group where I'm getting the error. df5 is grouped by "Detail", "ID", "Year"
data2 = {"Year":["2012","2012","2012","2012","2012","2012","2012","2012","2012"],
"Country":['USA','USA','USA','USA','USA','USA','USA','CANADA',"CANADA"],
"Country_2": ["", "", "", "", "", "", "", "USA", "USA"],
"ID":["AF12","A15","BU14","DU157","L12","N10","RU156","DU157","RU156"],
"Detail":[1,1,1,1,1,1,1,1,1],
"Second_country_available":[False,False,False,False,False,False,False,True,True],
}
df5 = pd.DataFrame(data2)
df5_true = df5["Second_country_available"] == True
Country_2_gr = df5[df5_true].groupby(["Detail", "ID", "Year"])['Country_2'].agg(
'|'.join)
Country_2_gr
grouped_df5 = (df5.groupby(["Detail", "ID", "Year"], group_keys=False)['Country'])
filtered = grouped_df5.transform(lambda g: g.str.fullmatch(Country_2_gr[g.name]))
filtered
The error would be:
return (self._engine.get_loc(key), None)
File "pandas\_libs\index.pyx", line 774, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: (1, 'A15', '2012')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "packages\pandas\core\indexes\.py", line 3045, in _get_loc_level
raise KeyError(key) from err
KeyError: (1, 'A15', '2012')
The code is working for most of the cases, so I don't want to radically change it. I would like to have a fix where in a similar case to the one I showed, the rows would be dropped.
CodePudding user response:
Country_2_gr
is based on filtered dataframe, so it will not have all the keys, you can try switching to get
with default:
filtered = grouped_df5.transform(lambda g: g.str.fullmatch(Country_2_gr.get(g.name, default="")))
filtered