I tried to concatenate the two columns firstname and lastname into one columns. Now, how about to split it into two columns when his/her firstname consisting of 2, 3 and more words from his/her lastname? Is there any easy way to do it? I tried using str.split() methods. But it says "columns should be the same length."
CodePudding user response:
We can use str.extract
here:
df[["firstname", "lastname"]] = df["fullname"].str.extract(r'^(\w (?: \w )*) (\w )$')
The regex pattern used above assigns as many name words from the start as possible, leaving only the final component for the last name. Here is a formal explanation of the regex:
^
from the start of the name(
open first capture group\1
\w
match the first word(?: \w )*
then match space and another word, together zero or more times
)
close first capture group(\w )
match and capture the last word as the last name in\2
$
end of the name