Home > database >  Python -- Regex match pattern OR end of string
Python -- Regex match pattern OR end of string

Time:08-28

import re
re.findall("(\ ?1?[ -.]?\(?\d{3}\)?[ -.]?\d{3}[ -.]?\d{4})(?:[ <$])", " 1.222.222.2222<")

The above code works fine if my string ends with a "<" or space. But if it's the end of the string, it doesn't work. How do I get 1.222.222.2222 to return in this condition:

import re
re.findall("(\ ?1?[ -.]?\(?\d{3}\)?[ -.]?\d{3}[ -.]?\d{4})(?:[ <$])", " 1.222.222.2222")

*I removed the "<" and just terminated the string. It returns none in this case. But I'd like it to return the full string -- 1.222.222.2222

POSSIBLE ANSWER:

import re
re.findall("(\ ?1?[ -.]?\(?\d{3}\)?[ -.]?\d{3}[ -.]?\d{4})(?:[ <]|$)", " 1.222.222.2222")

CodePudding user response:

I think you've solved the end-of-string issue, but there are a couple of other potential issues with the pattern in your question:

  • the - in [ -.] either needs to be escaped as \- or placed in the first or last position within square brackets, e.g. [-. ] or [ .-]; if you search for [] in the docs here you'll find the relevant info:
Ranges of characters can be indicated by giving two characters and separating them 
by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match
all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal
digit. If - is escaped (e.g. [a\-z]) or if it’s placed as the first or last character
(e.g. [-a] or [a-]), it will match a literal '-'.
  • you may want to require that either matching parentheses or none are present around the first 3 of 10 digits using (?:\(\d{3}\) ?|\d{3}[-. ]?)

Here's a possible tweak incorporating the above

import re
pat = "^((?:\ 1[-. ]?|1[-. ]?)?(?:\(\d{3}\) ?|\d{3}[-. ]?)\d{3}[-. ]?\d{4})(?:[ <]|$)"
print( re.findall(pat, " 1.222.222.2222") )
print( re.findall(pat, " 1(222)222.2222") )
print( re.findall(pat, " 1(222.222.2222") )

Output:

[' 1.222.222.2222']
[' 1(222)222.2222']
[]

CodePudding user response:

Maybe try:

import re
re.findall("(\ ?1?[ -.]?\(?\d{3}\)?[ -.]?\d{3}[ -.]?\d{4})(?:| |<|$)", " 1.222.222.2222")
  • null matches any position, 1.222.222.2222
  • matches space character, 1.222.222.2222
  • < matches less-than sign character, 1.222.222.2222<
  • $ end of line, 1.222.222.2222

You can also use regex101 for easier debugging.

  • Related