I am working with Regex and currently I am trying to extract the Name, IFSC and Account No. from the PDF. I am using following code to extract the details.
acc_name= " ", '\n'.join([re.sub(r'^[\d \t] |[\d \t] :$', '', line) for line in data.splitlines() if 'Mr. ' in line])
acc_no= " ", '\n'.join([re.sub(r'Account Number\s :', '', line) for line in data.splitlines() if 'Account Number' in line])
acc_code = " ", '\n'.join([re.sub(r'IFSC Code\s :', '', line) for line in data.splitlines() if 'IFSC Code' in line])
But the data which I am getting back is following:
(' ', ' 50439602642')
(' ', 'Mr. MOHD AZFAR ALAM LARI')
(' ', ' ALLA0211993')
I want to remove the commas, brackets and quotes. I am new with regex so any help would be appreciated.
CodePudding user response:
You're creating a tuple:
>>> " ", "\n'
(" ", "\n')
>>>
As you can see, a tuple is created, so either you mean by:
acc_name= ' \n'.join([re.sub(r'^[\d \t] |[\d \t] :$', '', line) for line in data.splitlines() if 'Mr. ' in line])
acc_no= ' \n'.join([re.sub(r'Account Number\s :', '', line) for line in data.splitlines() if 'Account Number' in line])
acc_code = ' \n'.join([re.sub(r'IFSC Code\s :', '', line) for line in data.splitlines() if 'IFSC Code' in line])
Or just a space:
acc_name= ' '.join([re.sub(r'^[\d \t] |[\d \t] :$', '', line) for line in data.splitlines() if 'Mr. ' in line])
acc_no= ' '.join([re.sub(r'Account Number\s :', '', line) for line in data.splitlines() if 'Account Number' in line])
acc_code = ' '.join([re.sub(r'IFSC Code\s :', '', line) for line in data.splitlines() if 'IFSC Code' in line])