Home > Software engineering >  Regex Error Finding Details from a bank statement
Regex Error Finding Details from a bank statement

Time:09-22

I am working with Regex and currently I am trying to extract the Name, IFSC and Account No. from the PDF. I am using following code to extract the details.

acc_name= " ", '\n'.join([re.sub(r'^[\d \t] |[\d \t] :$', '', line) for line in data.splitlines() if 'Mr. ' in line])
acc_no= " ", '\n'.join([re.sub(r'Account Number\s :', '', line) for line in data.splitlines() if 'Account Number' in line])
acc_code = " ", '\n'.join([re.sub(r'IFSC Code\s :', '', line) for line in data.splitlines() if 'IFSC Code' in line])

But the data which I am getting back is following:

(' ', ' 50439602642')
(' ', 'Mr. MOHD AZFAR ALAM LARI')
(' ', ' ALLA0211993')

I want to remove the commas, brackets and quotes. I am new with regex so any help would be appreciated.

CodePudding user response:

You're creating a tuple:

>>> " ", "\n'
(" ", "\n')
>>>

As you can see, a tuple is created, so either you mean by:

acc_name= ' \n'.join([re.sub(r'^[\d \t] |[\d \t] :$', '', line) for line in data.splitlines() if 'Mr. ' in line])
acc_no= ' \n'.join([re.sub(r'Account Number\s :', '', line) for line in data.splitlines() if 'Account Number' in line])
acc_code = ' \n'.join([re.sub(r'IFSC Code\s :', '', line) for line in data.splitlines() if 'IFSC Code' in line])

Or just a space:

acc_name= ' '.join([re.sub(r'^[\d \t] |[\d \t] :$', '', line) for line in data.splitlines() if 'Mr. ' in line])
acc_no= ' '.join([re.sub(r'Account Number\s :', '', line) for line in data.splitlines() if 'Account Number' in line])
acc_code = ' '.join([re.sub(r'IFSC Code\s :', '', line) for line in data.splitlines() if 'IFSC Code' in line])
  • Related