Home > OS >  Regular expression for extracting Russian passport numbers
Regular expression for extracting Russian passport numbers

Time:06-19

I need regular extraction that extract passport number after specific word паспорт .

Possible options are:

  • паспорт 5715 424141
  • паспорт 5715-424141
  • паспорт 5715 - 424141

I need to extract first 4 and last 6 numbers after word паспорт occurred, so result should be 5715 and 424141.

I tried ^(\d{4})\ (\d{6})$ but it's not detected my pattern.

CodePudding user response:

For starters, the ^ symbol means the start of the string, so that already fails your pattern (as the strings start with "паспорт").

It also seems that the - between the number groups is optional and may have spaces which you don't support.

To fix all those issues, use:

паспорт (\d{4})\s*-?\s*(\d{6})
  • паспорт - literal match.
  • (\d{4}) - a capture group of four digits.
  • \s* - any number of spaces, including 0.
  • -? - an optional dash.
  • \s* - any number of spaces, including 0.
  • (\d{6}) - a capture group of six digits.

And since you tagged with Python:

import re

s = """паспорт 5715 424141
паспорт 5715-424141
паспорт 5715 - 424141"""

for line in s.splitlines():
    print(re.search(r"паспорт (\d{4})\s*-?\s*(\d{6})", line).groups())
# ('5715', '424141')

Regex demo

  • Related