Home > Net >  Parsing data from the squished string
Parsing data from the squished string

Time:10-15

I need to write a pattern using Regex, which from the string "PriitPann39712047623 372 5688736402-12-1998Oja 18-2,Pärnumaa,Are" will return a first name, last name, id code, phone number, date of birth and address. There are no hard requirements beside that both the first and last names always begin with a capital letter followed by at least one lowercase letter, the id code always consists of 11 characters (numbers only), the phone number calling code is 372 and the phone number itself consists of 8 numbers, the date of birth has the format dd-mm-yyyy, and the address has no specific pattern.

That is, taking the example above, the result should be [("Priit", "Pann", "39712047623", " 372 56887364", "02-12-1998", "Oja 18-2,Parnumaa,Are")]. I got this pattern

r"([1-9][0-9]{10})(\ \d{3}\s*\d{7,8})(\d{1,2}\ -\d{1,2}\-\d{1,4})"

however it returns everything except first name, last name and address. For example, ^[^0-9]* returns both the first and last name, however I don't understand how to make it return them separately. How can it be improved so that it also separately finds both the first and last name, as well as the address?

CodePudding user response:

The following regex splits each of the fields into a separate group.

r"([A-Z] [a-z] )([A-Z] [a-z] )([0-9]*)(\ 372 [0-9]{8,8})([0-9]{2,2}-[0-9]{2,2}-[0-9]{4,4})(.*$)"

You can get each group by calling

m = re.search(regex, search_string)
for i in range(num_fields):
    group_i = m.group(i)
  • Related