I want to split Full name in three columns such as first name, Middle name and last name for eg :
1 Moni Avhad
2 Kumar Krishnan Menon
3 Divya Sharma
4 Mimi Atul Sharma
5 Murad Oberoi
I used the below code to split the names:
import pandas as pd
df = pd.read_excel("Sample.xlsx")
df[['first Name','middle Name','Last Name']]=df['Full Name'].str.split(' ',n=2,expand=True)
df
but many a time my full name is not constant for example(Kumar Krishnan Menon , Mini Avinash Sharma) have three seperate names First,middle and last name.But in other cases middle name is not present.
Expected output:
First Name Middle Name Last Name
1 Moni N/A Avhad
2 Kumar Krishnan Menon
3 Divya N/A Sharma
4 Mimi Atul Sharma
5 Murad N/A Oberoi
Also if there are multiple Surname such as Rani Amol Sharma-Warma How do we split that and one more scenario where no middle name is present but two surnames are present eg Disha Sharma Warma
CodePudding user response:
Use your solution without n=2
and space and then replace missing values in Last
column by middle
values and set None
s:
df[['first Name','middle Name','Last Name']] = df['Full Name'].str.split(expand=True)
m = df['Last Name'].isna()
df['Last Name'] = df['Last Name'].fillna(df['middle Name'])
df.loc[m, 'middle Name'] = None
print (df)
Full Name first Name middle Name Last Name
0 Moni Avhad Moni None Avhad
1 Kumar Krishnan Menon Kumar Krishnan Menon
2 Divya Sharma Divya None Sharma
3 Mimi Atul Sharma Mimi Atul Sharma
4 Murad Oberoi Murad None Oberoi
Another idea with splitting to Series
called s
and set None
s by Series.mask
:
s = df['Full Name'].str.split()
df['First_Name'] = s.str[0]
df['Middle_Name'] = s.str[1].mask(s.str.len().ne(3), None)
df['Last_Name'] = s.str[-1]
print (df)
Full Name First_Name Middle_Name Last_Name
0 Moni Avhad Moni None Avhad
1 Kumar Krishnan Menon Kumar Krishnan Menon
2 Divya Sharma Divya None Sharma
3 Mimi Atul Sharma Mimi Atul Sharma
4 Murad Oberoi Murad None Oberoi
CodePudding user response:
Not the most elegant way but you could use apply:
df["FirstName"] = df["FullName"].apply(lambda x : x.split()[0])
df["MiddleName"] = df["FullName"].apply(lambda x : " ".join(x.split()[1:-1]) if len(x.split())>2 else None)
df["LastName"] = df["FullName"].apply(lambda x : x.split()[-1])
The benefit of this being that, if a person has more than one middle name, the code would not fail. Also, it is bad habit to leave spaces and/or special characters in the column names of the data frames. It's best to take care of that.
CodePudding user response:
You can use Series.str.extract() method here:
In [93]: df
Out[93]:
Full Name
0 Moni Avhad
1 Kumar Krishnan Menon
2 Divya Sharma
3 Mimi Atul Sharma
4 Murad Oberoi
In [94]: df[["FirstName", "MiddleName", "LastName"]] = \
df["Full Name"].str.extract(r"(?P<First_Name>\w )\s*(?P<Middle_Name>\w )?\s (?P<Last_Name>\w )")
In [95]: df
Out[95]:
Full Name FirstName MiddleName LastName
0 Moni Avhad Moni NaN Avhad
1 Kumar Krishnan Menon Kumar Krishnan Menon
2 Divya Sharma Divya NaN Sharma
3 Mimi Atul Sharma Mimi Atul Sharma
4 Murad Oberoi Murad NaN Oberoi
CodePudding user response:
You can use pd.str.split
and fill middle name column only when length of string is 3.
splitted = df['Full Name'].str.split()
df['First_Name'] = splitted.str[0]
df['Last_Name'] = splitted.str[-1]
df['Middle_Name'] = df['Full Name'].loc[splitted.str.len() == 3].str.split(expand=True)[1]
which gives us the expected output :
First_Name Middle_Name Last_Name
0 Moni NaN Avhad
1 Kumar Krishnan Menon
2 Divya NaN Sharma
3 Mimi Atul Sharma
4 Murad NaN Oberoi
CodePudding user response:
# Creating a dataframe
df = pd.DataFrame({'FullName':['Moni Avhad',
'Kumar Krishnan Menon',
'Divya Sharma',
'Mimi Atul Sharma',
'Murad Oberoi']})
# Using List Comprehension and checking the len and creating First Name, Mid Name, Last Name
df['FirstName'] = [x.split(' ')[0] for x in df['FullName']]
df['MidName'] = [x.split(' ')[1] if len(x.split(' '))==3 else np.NaN for x in df['FullName']]
df['LastName'] = [x.split(' ')[1] if len(x.split(' '))==2 else x.split(' ')[2] for x in df['FullName']]