I have a pandas dataframe with a fullnames field, I want to change the logic so that the First and Last name will have all the first and last word and the rest will go into the middle name field.
Note: The full name can contain two words in that case middle name will be null and there may be also extra spaces between the names.
Current Logic:
fullnames = "Walter John Ross Schmidt"
first, middle, *last = name.split()
print("First = {first}".format(first=first))
print("Middle = {middle}".format(middle=middle))
print("Last = {last}".format(last=" ".join(last)))
Output :
First = Walter
Middle = John
Last = Ross Schmidt
Expected Output :
FirstName = Walter
Middle = John Ross
Last = Schmidt
CodePudding user response:
You can use capture groups in the regex passed to str.extract()
, which will let you do this in a single operation:
df = pd.DataFrame({
"name": [
"Walter John Ross Schmidt",
"John Quincy Adams"
]
})
rx = re.compile(r'^(\w )\s (.*?)\s (\w )$')
df[['first', 'middle', 'last']] = df['name'].str.extract(pat=rx, expand=True)
This gives you:
name first middle last
0 Walter John Ross Schmidt Walter John Ross Schmidt
1 John Quincy Adams John Quincy Adams
CodePudding user response:
You can use negative indexing to get the last item in the list for the last name and also use a slice to get all but the first and last for the middle name:
fullnames = "Walter John Ross Schmidt"
first = fullnames.split()[0]
last = fullnames.split()[-1]
middle = " ".join(fullnames.split()[1:-1])
print("First = {first}".format(first=first))
print("Middle = {middle}".format(middle=middle))
print("Last = {last}".format(last=last))
PS if you are working with a data frame you can use:
df = pd.DataFrame({'fullnames':['Walter John Ross Schmidt']})
df = df.assign(**{
'first': df['fullnames'].str.split().str[0],
'middle': df['fullnames'].str.split().str[1:-1].str.join(' '),
'last': df['fullnames'].str.split().str[-1]
})
Output:
fullnames first middle last
0 Walter John Ross Schmidt Walter John Ross Schmidt
CodePudding user response:
I would use str.replace
and str.extract
here:
df["FirstName"] = df["FullName"].str.extract(r'^(\w )')
df["Middle"] = df["FullName"].str.replace(r'^\w \s |\s \w $', '')
df["Last"] = df["FullName"].str.extract(r'(\w )$')
CodePudding user response:
You can use the following line instead.
first, *middle, last = fullnames.split()