Home > database >  What i miss here? Python regular expression
What i miss here? Python regular expression

Time:03-24

Anyone know what i miss as i found a bug if i put 2 uppercase letters after an apostrophe or a hyphen.

Write a regular expression as a string (including the quotation marks) that matches the pattern of a last name, as follows:

  • must be between 1 and 32 characters total
  • must contain only letters and possibly an apostrophe and possibly a hyphen
  • must start with an uppercase letter
  • must contain all lowercase letters after the first letter except: the letter following an apostrophe or hyphen must be uppercase
  • the name must not end with an apostrophe or hyphen

My code: string = "Mc'Tiray_JJay"

if re.findall("^[A-Z][a-zA-Z'-]{0,31}", string): if re.findall("[\'-][A-Z][a-z]", string): print("yes") else: print("no")

The result is yes even i have 2 JJ

Result should be:

string = "Mc'Tiray-JJay" => no
string = "Mc'Tiray-Jay" => yes

CodePudding user response:

You can test the string by attempting to match the following regular expression:

^(?!.*(['-]).*\1)(?!.*['-](?:[a-z'-]|$))(?!.*(?<=[^'-])[A-Z])[A-Z][A-Za-z'-]{0,31}$

Demo

The regular expression can be broken down as follows.

^                # match beginning of string
(?!              # begin negative lookahead to require at most 1
                 # apostrophe or hyphen
  .*             # match >= 0 characters
  (['-])         # match in char in char class and save to capture group 1
  .*             # match >= 0 characters
  \1             # match the content of capture group 1
)                # end negative lookahead
(?!              # begin negative lookahead to require an apostrophe or 
                 # hyphen to be followed by an uppercase letter
  .*             # match >= 0 characters
  ['-]           # match in char in char class
  (?:            # begin non-capture group
    [a-z'-]      # match a char in char class
    |            # or
    $            # match end of string 
  )              # end non-capture group
)                # end negative lookahead
(?!              # begin negative lookahead to prevent an uppercase letter
                 # from being preceded by a char other than an apostrophe
                 # or hyphen 
  .*             # match >= 0 characters
  (?<=           # begin positive lookbehind
    [^'-]        # match a char in char class
  )              # end positive lookbehind
  [A-Z]          # match a char in char class
)                # end negative lookahead
[A-Z]            # match a char in char class
[A-Za-z'-]{0,31} # match 0-31 chars in char class
$                # match end of string 
  • Related