I'm trying to find all results of invoices in a document (e.g. INV-12345), but it is only showing 'INV-' and a lot of blank results when I paste. Any ideas?
import re
import pyperclip
invoiceRegex = re.compile(r'(INV-)?\d{4,6}')
text = pyperclip.paste()
extractedInvoice = invoiceRegex.findall(text)
allInvoices = []
for invoice in extractedInvoice:
allInvoices.append(invoice)
results = '\n'.join(allInvoices)
pyperclip.copy(results)
CodePudding user response:
re.findall
returns the content of the capturing group, if there is exactly one:
The result depends on the number of capturing groups in the pattern. If there are no groups, return a list of strings matching the whole pattern. If there is exactly one group, return a list of strings matching that group. If multiple groups are present, return a list of tuples of strings matching the groups. Non-capturing groups do not affect the form of the result.
So you can use the following regex instead:
invoiceRegex = re.compile(r'(?:INV-)?(\d{4,6})')