Get first element of tokenized words in a row-CodePudding

Using the existing column name, add a new column first_name to df such that the new column splits the name into multiple words and takes the first word as its first name. For example, if the name is Elon Musk, it is split into two words in the list ['Elon', 'Musk'] and the first word Elon is taken as its first name. If the name has only one word, then the word itself is taken as its first name.

A snippet of the data frame

Name
Alemsah Ozturk
Igor Arinich
Christopher Maloney
DJ Holiday
Brian Tracy
Philip DeFranco
Patrick Collison
Peter Moore
Dr.Darrell Scott
Atul Gawande
Everette Taylor
Elon Musk
Nelly_Mo

This is what I have so far. I am not sure how to extract the name after I tokenize it

import nltk
first = df.name.apply(lambda x: nltk.word_tokenize(x))
df["first_name"] = This is where I'm stuck

CodePudding user response：

Try this snippet:

df["first_name"] = df['Name'].map(lambda x: x.split(' ')[0])
df["last_name"] = df['Name'].map(lambda x: x.split(' ')[1])