I need regular extraction that extract passport number after specific word паспорт
.
Possible options are:
паспорт 5715 424141
паспорт 5715-424141
паспорт 5715 - 424141
I need to extract first 4 and last 6 numbers after word паспорт
occurred, so result should be 5715
and 424141
.
I tried ^(\d{4})\ (\d{6})$
but it's not detected my pattern.
CodePudding user response:
For starters, the ^
symbol means the start of the string, so that already fails your pattern (as the strings start with "паспорт").
It also seems that the -
between the number groups is optional and may have spaces which you don't support.
To fix all those issues, use:
паспорт (\d{4})\s*-?\s*(\d{6})
паспорт
- literal match.(\d{4})
- a capture group of four digits.\s*
- any number of spaces, including 0.-?
- an optional dash.\s*
- any number of spaces, including 0.(\d{6})
- a capture group of six digits.
And since you tagged with Python:
import re
s = """паспорт 5715 424141
паспорт 5715-424141
паспорт 5715 - 424141"""
for line in s.splitlines():
print(re.search(r"паспорт (\d{4})\s*-?\s*(\d{6})", line).groups())
# ('5715', '424141')