I'm trying to clean the column of a dataframe so only the first name is left.
What I'm currently trying is splitting each value of the column into a list, and then gathering the [0] of the list, if the element contains a comma, then get the [1].
I try the code below and it works perfectly, except for some outlier values that only have the last name with a comma, but no first name as shown in the examples below:
What I used:
msgDFt['From Name'] = msgDFt['From Name'].str.split(' ')
msgDFt['From Name'] = msgDFt['From Name'].apply(lambda row: row[0] if ',' not in row[0] else row[1])
Now I'm aware that lambda functions do not have the feature of having a try-except, the next thing I tried was a function:
def firstNameMod(name):
for n in name:
if n[0] == None:
name = 'NOT FOUND'
elif ',' in n[0]:
name = name[1]
elif ',' in n[0] and n[1] == None:
name = name[0]
elif n[0] != False:
name = name[0]
df.apply(firstNameMod(df['Name']))
This did not work because the column I'm selected is being read as a NoneType.
What I have:
Name
0 Robert Marin
1 Katherine Ortiz
2 Sloth, Herbert
3 Perez,
What I want:
Name
0 Robert
1 Katherine
2 Herbert
3 NaN
sample dataframe:
names = {'Name': ['Robert Marin','Katherine Ortiz', 'Sloth, Herbert','Perez,']}
df = pd.DataFrame(names)
CodePudding user response:
You can use a regex to extract your first name:
df['Name'].str.extract('(^\w (?=[^,]*$)|(?<=, )\w )')[0]
output:
0 Robert
1 Katherine
2 Herbert
3 NaN