TypeError: unhashable type: 'list' when I try to do a pivot from a column in pandas-CodePudding

My Python code takes a bank statement from Excel and creates a dataframe that categorises each transaction based on description.

Example code:

import pandas as pd
import openpyxl
import datetime as dt
import numpy as np

dff = pd.DataFrame({'Date': ['20221003', '20221005'],
                   'Tran Type': ['BOOK TRANSFER CREDIT', 'ACH DEBIT'],
                   'Debit Amount': [0.00, -220000.00],
                   'Credit Amount': [182.90, 0.0],
                   'Description': ['BOOK TRANSFER CREDIT FROM ACCOUNT 98743987', 'USREF2548 ACH OFFSET'],
                   'Amount': [-220000.00, 182.90]})

Then the bit that adds a column that categorises it if certain words appear in the description:

import re
dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)

Output: What it looks like

But this code will not work. Any ideas why?

pivotf = dff
pivotf = pd.pivot_table(pivotf, 
index=["Date"], columns="Category",
values=['Amount'],
margins=False, margins_name="Total")

The error message is TypeError: unhashable type: 'list'

When I change columns from "Category" to anything else, it works fine. I have tried converting the column from object to string but this doesn't actually convert it.

Thanks! (Note that this is a repost of a question that was downvoted for not giving a reproducible example, so I've tried to do that above.)

CodePudding user response：

you can use explode() function:

dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE).explode()
print(dff)

Returning:

       Date             Tran Type  ...    Amount  Category
0  20221003  BOOK TRANSFER CREDIT  ... -220000.0  TRANSFER
1  20221005             ACH DEBIT  ...     182.9       REF

If Category column has multiple values you have to use it like this:

dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)
dff=dff.explode('Category')

CodePudding user response：

This is because the output of findall() is a list:

Return all non-overlapping matches of pattern in string, as a list of strings or tuples

Which is not a valid type when using pivot_table(). Add str[0] at the end if you expect to only have one find per Description/row and it should work:

dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE).str[0]

However, it may be more efficient under this scenario, to use search() which would be the perfect fit for this case:

dff['Category'] = dff['Description'].str.search('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)

Regardless of the approach, you will get as final output:

         Amount          
Category    REF  TRANSFER
Date                     
20221003    NaN -220000.0
20221005  182.9       NaN

Otherwise, you can use explode() as @Clenage suggests but, as shown here, it would lead to a potential issue of duplicated values because of different Categories:

pivotf = pd.pivot_table(pivotf.explode('Category'), index=["Date"], columns="Category",values=['Amount'],
margins=False, margins_name="Total")

Outputting for example:

Category       BCA      FUND    REF  TRANSFER
Date                                         
20221003 -220000.0 -220000.0    NaN -220000.0
20221005       NaN       NaN  182.9       NaN

In which case, it's untrue that for 2022-10-03 there was a total of -220000 * 3 money movement.