My Python code takes a bank statement from Excel and creates a dataframe that categorises each transaction based on description.
Example code:
import pandas as pd
import openpyxl
import datetime as dt
import numpy as np
dff = pd.DataFrame({'Date': ['20221003', '20221005'],
'Tran Type': ['BOOK TRANSFER CREDIT', 'ACH DEBIT'],
'Debit Amount': [0.00, -220000.00],
'Credit Amount': [182.90, 0.0],
'Description': ['BOOK TRANSFER CREDIT FROM ACCOUNT 98743987', 'USREF2548 ACH OFFSET'],
'Amount': [-220000.00, 182.90]})
Then the bit that adds a column that categorises it if certain words appear in the description:
import re
dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)
Output: What it looks like
But this code will not work. Any ideas why?
pivotf = dff
pivotf = pd.pivot_table(pivotf,
index=["Date"], columns="Category",
values=['Amount'],
margins=False, margins_name="Total")
The error message is TypeError: unhashable type: 'list'
When I change columns from "Category" to anything else, it works fine. I have tried converting the column from object to string but this doesn't actually convert it.
Thanks! (Note that this is a repost of a question that was downvoted for not giving a reproducible example, so I've tried to do that above.)
CodePudding user response:
you can use explode() function:
dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE).explode()
print(dff)
Returning:
Date Tran Type ... Amount Category
0 20221003 BOOK TRANSFER CREDIT ... -220000.0 TRANSFER
1 20221005 ACH DEBIT ... 182.9 REF
If Category column has multiple values you have to use it like this:
dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)
dff=dff.explode('Category')
CodePudding user response:
This is because the output of findall() is a list:
Return all non-overlapping matches of pattern in string, as a list of strings or tuples
Which is not a valid type when using pivot_table(). Add str[0]
at the end if you expect to only have one find per Description/row and it should work:
dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE).str[0]
However, it may be more efficient under this scenario, to use search() which would be the perfect fit for this case:
dff['Category'] = dff['Description'].str.search('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)
Regardless of the approach, you will get as final output:
Amount
Category REF TRANSFER
Date
20221003 NaN -220000.0
20221005 182.9 NaN
Otherwise, you can use explode() as @Clenage suggests but, as shown here, it would lead to a potential issue of duplicated values because of different Categories:
pivotf = pd.pivot_table(pivotf.explode('Category'), index=["Date"], columns="Category",values=['Amount'],
margins=False, margins_name="Total")
Outputting for example:
Category BCA FUND REF TRANSFER
Date
20221003 -220000.0 -220000.0 NaN -220000.0
20221005 NaN NaN 182.9 NaN
In which case, it's untrue that for 2022-10-03 there was a total of -220000 * 3 money movement.