I am reading a PDF file (on one script using PyPDF2) and on this one using (tika).
In both, I have a problem with re.finditer
.
I'll have a line of code like this:
bank_pattern = '^.* (Bank|bank|BANK).*$'
bank = re.finditer(bank_pattern, text)
print('Here should be the bank name:')
print(bank)
print('')
for match in bank:
print(match)
And I get following:
Here should be the bank name:
<callable_iterator object at 0x0000020BA86B4430>
Can someone help me understand why doesn't it show the matches? (I am trying to get the whole line where BANK, bank or Bank are mentioned - before and after the match)
P.S. read PDF part with banks:
Intermediary Bank (USD): censored,
New York, USA; SWIFT: censored
Intermediary Bank (EUR): censored,
Frankfurt, Germany; SWIFT: censored
Thanks!
CodePudding user response:
We can use re.findall
here with the pattern ^.*\bbank\b.*$
:
inp = """Intermediary Bank (USD): censored,
New York, USA; SWIFT: censored
Intermediary Bank (EUR): censored,
Frankfurt, Germany; SWIFT: censored"""
lines = re.findall(r'^.*\bbank\b.*$', inp, flags=re.I|re.M)
print(lines)
This prints:
['Intermediary Bank (USD): censored,', 'Intermediary Bank (EUR): censored,']