I have a problem, I try to recognize a pattern among a list of words. I need to find a number of 1 to 6 digits with or without characters around.
my input is this: [1]: https://i.stack.imgur.com/RNOdL.png
With the OCR I obtained:
Kundennummer:
21924
The pattern r"(\D|\A) \d{5}(\D|\Z) " works but when I change it to r"(\D|\A) \d{1,6}(\D|\Z) " it doesn't.
I used re.match, re.findall and re.search and none of them works
the repr():
'Kundennummer:'
'21924'
CodePudding user response:
Assuming you only need the first match:
import re
ocr_result = """
Kundennummer:
21924
"""
for result in re.findall(r'\d ', ocr_result):
if 1 <= len(result) <= 6:
break
else:
result = None
print(result)
Result:
21924
CodePudding user response:
ocr_result1 = """
Kundennummer:
21924
"""
ocr_result2 = """
Kundennummer:3000
"""
for e in [ocr_result1, ocr_result2]:
print(re.findall(r'\w*\d{1,6}\w*', e))
['21924']
['3000']