At first, I have to write function GetTitle() method, where takes string, and this function must split the name till comma, then again split till dot, next convert the title to lower case and finally return this value of the title.
I wrote function, but I don't get how to use or rewrite this function for series of string
def GetTitle(Name):
k=Name.split(" ")
item=k[1]
title=item.split(".")
res=title[0]
lower_res=res.lower()
return lower_res
This my column, where I should use this function
Photo
CodePudding user response:
Here is an approach using pandas.Series.str.extract
with a regex expression :
df['Title1'] = df['Name'].str.extract(r',\s(\w ).', expand=False)
Add pandas.Series.str.lower
if you need to lowercase the title :
df['Title2'] = df['Name'].str.extract(r',\s(\w ).', expand=False).str.lower()
# Output :
print(df)
Passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked Title1 Title2
0 1 0 NaN Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S Mr mr
1 2 1 NaN Cumings, Mrs. John Bradley (Florence Briggs Th. female 38.0 1 0 PC 17599 71.2833 C85 C Mrs mrs
2 3 1 NaN Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S Miss miss
3 4 1 NaN Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S Mrs mrs
4 5 0 3.0 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S Mr mr
886 887 0 2.0 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S Rev rev
887 888 1 1.0 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S Miss miss
888 889 0 3.0 Johnston, Miss. Catherine Helen "Carrie" emale NaN 1 2 W./C.6607 23.4500 NaN S Miss miss
889 890 1 1.0 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C Mr mr
890 891 0 3.0 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q Mr mr
CodePudding user response:
Assuming you wish to run your defined function on column "Name" & your data is loaded as df... Try this then;
df["Title"] = df["Name"].apply(lambda x: GetTitle(str(x)))
CodePudding user response:
You can use Series.str.split
for this with alternative delimiters:
df = pd.DataFrame({'Name': ['Bar, Mr. Foo',
'Baz, Miss. Bar']})
df['Title'] = df.Name.str.split('. |, ', expand=True)[1].str.lower()
print(df)
Name Title
0 Bar, Mr. Foo mr
1 Baz, Miss. Bar miss