Home > Mobile >  how can i split a full name to first name, Middle name and last name in python jupyter notebook?
how can i split a full name to first name, Middle name and last name in python jupyter notebook?

Time:08-16

I want to split Full name in three columns such as first name, Middle name and last name for eg :

1   Moni Avhad
2   Kumar Krishnan Menon
3   Divya Sharma
4   Mimi Atul Sharma
5   Murad Oberoi

I used the below code to split the names:

import pandas as pd
df = pd.read_excel("Sample.xlsx")
df[['first Name','middle Name','Last Name']]=df['Full Name'].str.split(' ',n=2,expand=True)
df

but many a time my full name is not constant for example(Kumar Krishnan Menon , Mini Avinash Sharma) have three seperate names First,middle and last name.But in other cases middle name is not present.

Expected output:

    First Name   Middle Name   Last Name
1   Moni            N/A         Avhad
2   Kumar         Krishnan      Menon
3   Divya           N/A         Sharma
4   Mimi           Atul         Sharma
5   Murad           N/A         Oberoi

Also if there are multiple Surname such as Rani Amol Sharma-Warma How do we split that and one more scenario where no middle name is present but two surnames are present eg Disha Sharma Warma

CodePudding user response:

Use your solution without n=2 and space and then replace missing values in Last column by middle values and set Nones:

df[['first Name','middle Name','Last Name']] = df['Full Name'].str.split(expand=True)

m = df['Last Name'].isna()
df['Last Name'] = df['Last Name'].fillna(df['middle Name'])
df.loc[m, 'middle Name'] = None
print (df)
                  Full Name first Name middle Name Last Name
0                Moni Avhad       Moni        None     Avhad
1      Kumar Krishnan Menon      Kumar    Krishnan     Menon
2              Divya Sharma      Divya        None    Sharma
3          Mimi Atul Sharma       Mimi        Atul    Sharma
4              Murad Oberoi      Murad        None    Oberoi
    

Another idea with splitting to Series called s and set Nones by Series.mask:

s = df['Full Name'].str.split()
df['First_Name'] = s.str[0]
df['Middle_Name'] = s.str[1].mask(s.str.len().ne(3), None)
df['Last_Name'] = s.str[-1]
print (df)
                  Full Name First_Name Middle_Name Last_Name
0                Moni Avhad       Moni        None     Avhad
1      Kumar Krishnan Menon      Kumar    Krishnan     Menon
2              Divya Sharma      Divya        None    Sharma
3          Mimi Atul Sharma       Mimi        Atul    Sharma
4              Murad Oberoi      Murad        None    Oberoi

CodePudding user response:

Not the most elegant way but you could use apply:

df["FirstName"] = df["FullName"].apply(lambda x : x.split()[0])
df["MiddleName"] = df["FullName"].apply(lambda x : " ".join(x.split()[1:-1]) if len(x.split())>2 else None)
df["LastName"] = df["FullName"].apply(lambda x : x.split()[-1])

The benefit of this being that, if a person has more than one middle name, the code would not fail. Also, it is bad habit to leave spaces and/or special characters in the column names of the data frames. It's best to take care of that.

CodePudding user response:

You can use Series.str.extract() method here:

In [93]: df
Out[93]:
              Full Name
0            Moni Avhad
1  Kumar Krishnan Menon
2          Divya Sharma
3      Mimi Atul Sharma
4          Murad Oberoi

In [94]: df[["FirstName", "MiddleName", "LastName"]] = \
    df["Full Name"].str.extract(r"(?P<First_Name>\w )\s*(?P<Middle_Name>\w )?\s (?P<Last_Name>\w )")

In [95]: df
Out[95]:
              Full Name FirstName MiddleName LastName
0            Moni Avhad      Moni        NaN    Avhad
1  Kumar Krishnan Menon     Kumar   Krishnan    Menon
2          Divya Sharma     Divya        NaN   Sharma
3      Mimi Atul Sharma      Mimi       Atul   Sharma
4          Murad Oberoi     Murad        NaN   Oberoi

CodePudding user response:

You can use pd.str.split and fill middle name column only when length of string is 3.

splitted = df['Full Name'].str.split()
df['First_Name'] = splitted.str[0]
df['Last_Name'] = splitted.str[-1]
df['Middle_Name'] = df['Full Name'].loc[splitted.str.len() == 3].str.split(expand=True)[1]

which gives us the expected output :

  First_Name Middle_Name Last_Name
0       Moni         NaN     Avhad
1      Kumar    Krishnan     Menon
2      Divya         NaN    Sharma
3       Mimi        Atul    Sharma
4      Murad         NaN    Oberoi

CodePudding user response:

# Creating a dataframe
df = pd.DataFrame({'FullName':['Moni Avhad',
'Kumar Krishnan Menon',
'Divya Sharma',
'Mimi Atul Sharma',
'Murad Oberoi']})

# Using List Comprehension and checking the len and creating First Name, Mid Name, Last Name
df['FirstName'] = [x.split(' ')[0] for x in df['FullName']]
df['MidName'] = [x.split(' ')[1] if len(x.split(' '))==3 else np.NaN for x in df['FullName']]
df['LastName'] = [x.split(' ')[1] if len(x.split(' '))==2 else x.split(' ')[2] for x in df['FullName']]
  • Related