I have dataset. Here is the column of 'Name':
0 Braund, Mr. Owen Harris
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
2 Heikkinen, Miss. Laina
3 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 Allen, Mr. William Henry
...
151 Pears, Mrs. Thomas (Edith Wearne)
152 Meo, Mr. Alfonzo
153 van Billiard, Mr. Austin Blyler
154 Olsen, Mr. Ole Martin
155 Williams, Mr. Charles Duane
and need to extract first name, status, and second name. When I try this on simple string, its ok:
full_name="Braund, Mr. Owen Harris"
first_name=full_name.split(',')[0]
second_name=full_name.split('.')[1]
print('First name:',first_name)
print('Second name:',second_name)
status = full_name.replace(first_name, '').replace(',','').split('.')[0]
print('Status:',status)
>First name: Braund
>Second name: Owen Harris
>Status: Mr
But after trying to do this with pandas, I fail with the status:
df['first_Name'] = df['Name'].str.split(',').str.get(0) #its ok, worsk well
But after this:
status= df['Name'].str.replace(df['first_Name'], '').replace(',','').split('.').str.get(0)
I get a mistake:
>>TypeError: 'Series' objects are mutable, thus they cannot be hashed
What are possible solutions?
Edit:Thanks for the answers and extract columns. I do
def extract_name_data(row):
row.str.extract('(?P<first_name>[^,] ), (?P<status>\w .) (?P<second_name>[^(] \w) ?')
last_name = row['second_name']
title = row['status']
first_name = row['first_name']
return first_name, second_name, status
and get
AttributeError: 'str' object has no attribute 'str'
What can be done? Row is meaned to be df['Name']
CodePudding user response:
You could use str.extract
with named capturing groups:
df['Name'].str.extract('(?P<first_name>[^,] ), (?P<status>\w .) (?P<second_name>[^(] \w) ?')
output:
first_name status second_name
0 Braund Mr. Owen Harris
1 Cumings Mrs. John Bradley
2 Heikkinen Miss. Laina
3 Futrelle Mrs. Jacques Heath
4 Allen Mr. William Henry
5 Pears Mrs. Thomas
6 Meo Mr. Alfonzo
7 van Billiard Mr. Austin Blyler
8 Olsen Mr. Ole Martin
9 Williams Mr. Charles Duane
CodePudding user response:
You can also place your original codes with slight modification into Pandas .apply()
function for it to work, as follows:
Just replace your variable names in Python with the column names in Pandas.
For example, replace full_name
with x['Name']
and first_name
with x['first_Name']
within the lambda function of .apply()
function:
df['status'] = df.apply(lambda x: x['Name'].replace(x['first_Name'], '').replace(',','').split('.')[0], axis=1)
Though may not be the most efficient way of doing it, it's a way to easily modify your existing codes in Python into a workable version in Pandas.
Result:
print(df)
Name first_Name status
0 Braund, Mr. Owen Harris Braund Mr
1 Cumings, Mrs. John Bradley (Florence Briggs Th... Cumings Mrs
2 Heikkinen, Miss. Laina Heikkinen Miss
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) Futrelle Mrs
4 Allen, Mr. William Henry Allen Mr
151 Pears, Mrs. Thomas (Edith Wearne) Pears Mrs
152 Meo, Mr. Alfonzo Meo Mr
153 van Billiard, Mr. Austin Blyler van Billiard Mr
154 Olsen, Mr. Ole Martin Olsen Mr
155 Williams, Mr. Charles Duane Williams Mr