How to extract status in full name in pd.Dataframe column?-CodePudding

I have dataset. Here is the column of 'Name':

 0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
151                    Pears, Mrs. Thomas (Edith Wearne)
152                                     Meo, Mr. Alfonzo
153                      van Billiard, Mr. Austin Blyler
154                                Olsen, Mr. Ole Martin
155                          Williams, Mr. Charles Duane

and need to extract first name, status, and second name. When I try this on simple string, its ok:

full_name="Braund, Mr. Owen Harris"
first_name=full_name.split(',')[0]
second_name=full_name.split('.')[1]
print('First name:',first_name) 
print('Second name:',second_name)
status = full_name.replace(first_name, '').replace(',','').split('.')[0]
print('Status:',status)

>First name: Braund
>Second name:  Owen Harris
>Status:  Mr

But after trying to do this with pandas, I fail with the status:

df['first_Name'] = df['Name'].str.split(',').str.get(0) #its ok, worsk well

But after this:

status= df['Name'].str.replace(df['first_Name'], '').replace(',','').split('.').str.get(0)

I get a mistake:

>>TypeError: 'Series' objects are mutable, thus they cannot be hashed

What are possible solutions?

Edit:Thanks for the answers and extract columns. I do

def extract_name_data(row):
    
    row.str.extract('(?P<first_name>[^,] ), (?P<status>\w .) (?P<second_name>[^(] \w) ?')  
    last_name = row['second_name'] 
    title = row['status']     
    first_name = row['first_name'] 
    return first_name, second_name, status

and get

AttributeError: 'str' object has no attribute 'str'

What can be done? Row is meaned to be df['Name']

CodePudding user response：

You could use str.extract with named capturing groups:

df['Name'].str.extract('(?P<first_name>[^,] ), (?P<status>\w .) (?P<second_name>[^(] \w) ?')

output:

     first_name status    second_name
0        Braund    Mr.    Owen Harris
1       Cumings   Mrs.   John Bradley
2     Heikkinen  Miss.          Laina
3      Futrelle   Mrs.  Jacques Heath
4         Allen    Mr.  William Henry
5         Pears   Mrs.         Thomas
6           Meo    Mr.        Alfonzo
7  van Billiard    Mr.  Austin Blyler
8         Olsen    Mr.     Ole Martin
9      Williams    Mr.  Charles Duane

CodePudding user response：

You can also place your original codes with slight modification into Pandas .apply() function for it to work, as follows:

Just replace your variable names in Python with the column names in Pandas. For example, replace full_name with x['Name'] and first_name with x['first_Name'] within the lambda function of .apply() function:

df['status'] = df.apply(lambda x: x['Name'].replace(x['first_Name'], '').replace(',','').split('.')[0], axis=1)

Though may not be the most efficient way of doing it, it's a way to easily modify your existing codes in Python into a workable version in Pandas.

Result:

print(df)


                                                  Name    first_Name status
0                              Braund, Mr. Owen Harris        Braund     Mr
1    Cumings, Mrs. John Bradley (Florence Briggs Th...       Cumings    Mrs
2                               Heikkinen, Miss. Laina     Heikkinen   Miss
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)      Futrelle    Mrs
4                             Allen, Mr. William Henry         Allen     Mr
151                  Pears, Mrs. Thomas (Edith Wearne)         Pears    Mrs
152                                   Meo, Mr. Alfonzo           Meo     Mr
153                    van Billiard, Mr. Austin Blyler  van Billiard     Mr
154                              Olsen, Mr. Ole Martin         Olsen     Mr
155                        Williams, Mr. Charles Duane      Williams     Mr