Using the existing column name, add a new column first_name to df such that the new column splits the name into multiple words and takes the first word as its first name. For example, if the name is Elon Musk, it is split into two words in the list ['Elon', 'Musk'] and the first word Elon is taken as its first name. If the name has only one word, then the word itself is taken as its first name.
A snippet of the data frame
Name |
---|
Alemsah Ozturk |
Igor Arinich |
Christopher Maloney |
DJ Holiday |
Brian Tracy |
Philip DeFranco |
Patrick Collison |
Peter Moore |
Dr.Darrell Scott |
Atul Gawande |
Everette Taylor |
Elon Musk |
Nelly_Mo |
This is what I have so far. I am not sure how to extract the name after I tokenize it
import nltk
first = df.name.apply(lambda x: nltk.word_tokenize(x))
df["first_name"] = This is where I'm stuck
CodePudding user response:
Try this snippet:
df["first_name"] = df['Name'].map(lambda x: x.split(' ')[0])
df["last_name"] = df['Name'].map(lambda x: x.split(' ')[1])