Pandas Full Name Split into First , Middle and Last Names-CodePudding

I have a pandas dataframe with a fullnames field, I want to change the logic so that the First and Last name will have all the first and last word and the rest will go into the middle name field.

Note: The full name can contain two words in that case middle name will be null and there may be also extra spaces between the names.

Current Logic:

fullnames = "Walter John  Ross Schmidt"
first, middle, *last = name.split()
print("First = {first}".format(first=first))
print("Middle = {middle}".format(middle=middle))
print("Last = {last}".format(last=" ".join(last)))

Output :

First = Walter
Middle = John
Last = Ross Schmidt

Expected Output :

FirstName = Walter
Middle = John Ross
Last = Schmidt

CodePudding user response：

You can use capture groups in the regex passed to str.extract(), which will let you do this in a single operation:

df = pd.DataFrame({
    "name": [
        "Walter John  Ross Schmidt",
        "John Quincy Adams"
    ]
})

rx = re.compile(r'^(\w )\s (.*?)\s (\w )$')

df[['first', 'middle', 'last']] = df['name'].str.extract(pat=rx, expand=True)

This gives you:

    name                        first   middle      last
0   Walter John Ross Schmidt    Walter  John Ross   Schmidt
1   John   Quincy Adams         John    Quincy      Adams

CodePudding user response：

You can use negative indexing to get the last item in the list for the last name and also use a slice to get all but the first and last for the middle name:

fullnames = "Walter John  Ross Schmidt"
first = fullnames.split()[0]
last = fullnames.split()[-1]
middle = " ".join(fullnames.split()[1:-1])
print("First = {first}".format(first=first))
print("Middle = {middle}".format(middle=middle))
print("Last = {last}".format(last=last))

PS if you are working with a data frame you can use:

df = pd.DataFrame({'fullnames':['Walter John  Ross Schmidt']})
df = df.assign(**{
    'first': df['fullnames'].str.split().str[0],
    'middle': df['fullnames'].str.split().str[1:-1].str.join(' '),
    'last': df['fullnames'].str.split().str[-1]
})

Output:

   fullnames                  first   middle     last
0  Walter John  Ross Schmidt  Walter  John Ross  Schmidt

CodePudding user response：

I would use str.replace and str.extract here:

df["FirstName"] = df["FullName"].str.extract(r'^(\w )')
df["Middle"] = df["FullName"].str.replace(r'^\w \s |\s \w $', '')
df["Last"] = df["FullName"].str.extract(r'(\w )$')

CodePudding user response：

You can use the following line instead.

first, *middle, last = fullnames.split()