Home > Software engineering >  How to add a Zero-or-more-condition (?) to multiple characters via regex without creating a capturin
How to add a Zero-or-more-condition (?) to multiple characters via regex without creating a capturin

Time:12-03

The function rearrange_name should be given a name in the format: Last Name (Normal or Double-barrelled name) followed by a "," " " and the First Name (either just one first name or together with middle initial name or full middle name) Then the name should be rearranged to print it out as first name last name.

This is the start of the code.


import re
def rearrange_name(name):
    result = re.search (r"^(\w*), (\w*)$", name)
    if result == None:
    return name
  return "{} {}".format(result[2], result[1])

name=rearrange_name("Kennedy, John F.")
print(name)

I know this specific problem has already been posted before (Fix the regular expression used in the rearrange_name function so that it can match middle names, middle initials, as well as double surnames),

but i have a problem with the solution that was given that time as it allows for nonsense names like "-, John F." or " , John F." to be processed as well. I would have added a comment, but i don't have any reputation at all. This is my first post ever on stack overflow.

I'd like to change the code for it to be correct 100%. The original solution given:

import re
def rearrange_name(name):
  result = re.search(r"^([\w -] ), ([\w. ] )$", name)
  if result == None:
    return name
  return "{} {}".format(result[2], result[1])


name=rearrange_name("Kennedy, John F.")
print(name)

name=rearrange_name("Kennedy, John Fitzgerald")
print(name)

name=rearrange_name("Kennedy-McJohnson, John Fitzgerald")
print(name)

My solution approach, which you can see in the screenshot of regex101.com detects all the possible names given correctly, but the groups aren't detected the way they should.

enter image description here

I am struggling with it, as at least in my opinion you have to use "or" sequences ()? as groups which then aren't detected by the print function.

To give some examples: These should all work and everything else shouldnt (obviously varying letters should be allowed:

"Kennedy, John"

  • just normal Last name First name Output: John Kennedy

"Kennedy, John F." - Last name First name Middle name initials Output: John F. Kennedy

"Kennedy, John Fitzgerald" Last name First name Middle name John Fitzgerald Kennedy

"Kennedy-McJohnson, John Fitzgerald" Last name double barreled First name Middle name Output: John Fitzgerald Kennedy-McJohnson

"Kennedy-McJohnson, John F." Last name double barreled First name Middle name initials John F. Kennedy-McJohnson

Swap every letter for another letter. Characters that should be allowed: Letters (except for the spaces in between the names, the "." for the initial, the "-" for the double barreled name.

Not expected output as it should be considered invalid input:

input: |||?!**Kennedy, John F#####. output: |||?!**Kennedy, John F#####.

So if it is a valid name, the order is changed and put to the screen. If it is not a valid name, the name is printed out the way it is presented first.

CodePudding user response:

Try the pattern:

([A-Z][a-zA-Z] (?:-[A-Z][a-zA-Z] )?), ([A-Z][a-zA-Z] \s*(?:[A-Z][a-zA-Z] |[A-Z]\.)?)

Regex demo.

import re


pat = re.compile(
    r"([A-Z][a-zA-Z] (?:-[A-Z][a-zA-Z] )?), ([A-Z][a-zA-Z] \s*(?:[A-Z][a-zA-Z] |[A-Z]\.)?)"
)


def rearrange_name(name):
    m = pat.match(name)
    if m:
        return "{} {}".format(m.group(2), m.group(1))

    return name


name = rearrange_name("Kennedy, John F.")
print(name)

name = rearrange_name("Kennedy, John Fitzgerald")
print(name)

name = rearrange_name("Kennedy-McJohnson, John Fitzgerald")
print(name)

Prints:

John F. Kennedy
John Fitzgerald Kennedy
John Fitzgerald Kennedy-McJohnson
  • Related