Home > Enterprise >  Unhashable DataFrame - Groupby function
Unhashable DataFrame - Groupby function

Time:06-13

I have a function that is giving "TypeError: unhashable type: 'DataFrame'" error. There are two calls to the function the first is successful while the second throws the error. The data is the only difference between the two calls. The second call's data is a subset of the first (amounts > 25k).

 def loss_by_entity(data, col_list, sum_list):
    agg_list = sum_list
    for i, item in enumerate(sum_list, start=0):
        agg_list[i] = data.groupby(col_list)[item].sum()
        agg_list[i] = pd.DataFrame(agg_list[i])
        agg_list[i]['Loss_Type'] = item
    agg_data = pd.concat(agg_list)
    return agg_data

These are the calls and adj to data

Loss_Type = ['Incurred Loss', 'Paid ALAE', 'Incurred Loss & ALAE']
Quarterly_Total = loss_by_entity(data, ['lg_loss_entity'], Loss_Type)
lg_loss_data = data[abs(data['Incurred Loss & ALAE']) >= 250000 ]
Large_Loss = loss_by_entity(lg_loss_data, ['lg_loss_entity'], Loss_Type)

This is the error in its entirety

Traceback (most recent call last):
  File "H:\Python\Excel_Automation\Large_Loss_Report.py", line 47, in <module>
    Large_Loss = loss_by_entity(lg_loss_data, ['lg_loss_entity'], Loss_Type)
  File "H:\Python\Excel_Automation\Large_Loss_Report.py", line 12, in loss_by_entity
    agg_list[i] = data.groupby(col_list)[item].sum()
  File "H:\Python\Excel_Automation\venv\lib\site-packages\pandas\core\groupby\generic.py", line 1338, in __getitem__
    return super().__getitem__(key)
  File "H:\Python\Excel_Automation\venv\lib\site-packages\pandas\core\base.py", line 249, in __getitem__
    if key not in self.obj:
  File "H:\Python\Excel_Automation\venv\lib\site-packages\pandas\core\generic.py", line 1994, in __contains__
    return key in self._info_axis
  File "H:\Python\Excel_Automation\venv\lib\site-packages\pandas\core\indexes\base.py", line 5008, in __contains__
    hash(key)
TypeError: unhashable type: 'DataFrame'

CodePudding user response:

I think the problem is with agg_list = sum_list, which makes sum_list and Loss_Type also modified. So, after the first call to the function, Loss_Type was inserted into the dataframe, and this caused an error on the second call. You can use agg_list = sum_list.copy() to avoid it, or create an empty list for agg_list and then insert new dataframe.

Here is a similar question you can refer to

How do I clone a list so that it doesn't change unexpectedly after assignment?

  • Related