Home > database >  How can I replace blank values after doing str.findall to create a new column?
How can I replace blank values after doing str.findall to create a new column?

Time:11-13

This is the relevant line of code:

dfc['Category'] = dfc['Description'].str.findall('Split|Transfer|REF', flags=re.IGNORECASE)

It looks in the column description for Split, Transfer or REF and returns them if they appear. For the ones with none of those words in the description, it leaves the column blank and I am really struggling to find a way to fill it in. I have tried converting from object to string or replacing with 0 or with NaN. This doesn't produce errors but the columns remain blank. I have looked at these solutions) too and none work.

Any assistance appreciated. Full example code:

import pandas as pd
import openpyxl
import datetime as dt
import numpy as np

dff = pd.DataFrame({'Date': ['20221003', '20221005'],
                   'Tran Type': ['BOOK TRANSFER CREDIT', 'ACH DEBIT'],
                   'Debit Amount': [0.00, -220000.00],
                   'Credit Amount': [182.90, 0.0],
                   'Description': ['BOOK CREDIT FROM ACCOUNT 98743447', 'USREF2448 ACH OFFSET'],
                   'Amount': [-220000.00, 182.90]})

import re
dff['Category'] = dff['Description'].str.findall('Ref|BCA|Fund|Transfer', flags=re.IGNORECASE)

In the example, I want the Category column to show "Other" or something in the first row and REF in the second (which it does).

Is there anything I can add to the findall line that will give a default value of Other for non-matches? Or any way to replace the blank cells in the dataframe afterwards? Any help appreciated.

CodePudding user response:

A proposition using pandas.Series.str.join to flatten the list of the matches then pandas.DataFrame.replace to replace empty strings (0 match) with NaN values.

dff['Category'] = (
                    dff['Description'].str.findall('(Ref|BCA|Fund|Transfer)', flags=re.IGNORECASE)
                                      .str.join(", ")
                                      .replace(to_replace="", value=np.NaN)
                    )

# Output :

print(dff)

       Date             Tran Type  Debit Amount  Credit Amount                        Description    Amount Category
0  20221003  BOOK TRANSFER CREDIT           0.0          182.9  BOOK CREDIT FROM ACCOUNT 98743447 -220000.0      NaN
1  20221005             ACH DEBIT     -220000.0            0.0              USREF2448 DACH OFFSET     182.9      REF

You can change the value parameter to "Other", "Missing Value" of whatever you want.

  • Related