Home > front end >  How to split series of string by multi symbol?
How to split series of string by multi symbol?

Time:10-02

At first, I have to write function GetTitle() method, where takes string, and this function must split the name till comma, then again split till dot, next convert the title to lower case and finally return this value of the title.
I wrote function, but I don't get how to use or rewrite this function for series of string

def GetTitle(Name):
    k=Name.split(" ")
    item=k[1]
    title=item.split(".")
    res=title[0]
    lower_res=res.lower()
    return lower_res

This my column, where I should use this function
Photo

CodePudding user response:

Here is an approach using pandas.Series.str.extract with a regex expression :

df['Title1'] = df['Name'].str.extract(r',\s(\w ).', expand=False)

Add pandas.Series.str.lower if you need to lowercase the title :

df['Title2'] = df['Name'].str.extract(r',\s(\w ).', expand=False).str.lower()

# Output :

print(df)

     Passengerld  Survived  Pclass                                             Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked Title1 Title2
0              1         0     NaN                          Braund, Mr. Owen Harris    male  22.0      1      0         A/5 21171   7.2500   NaN        S     Mr     mr
1              2         1     NaN  Cumings, Mrs. John Bradley (Florence Briggs Th.  female  38.0      1      0          PC 17599  71.2833   C85        C    Mrs    mrs
2              3         1     NaN                           Heikkinen, Miss. Laina  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S   Miss   miss
3              4         1     NaN     Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S    Mrs    mrs
4              5         0     3.0                         Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S     Mr     mr
886          887         0     2.0                            Montvila, Rev. Juozas    male  27.0      0      0            211536  13.0000   NaN        S    Rev    rev
887          888         1     1.0                     Graham, Miss. Margaret Edith  female  19.0      0      0            112053  30.0000   B42        S   Miss   miss
888          889         0     3.0         Johnston, Miss. Catherine Helen "Carrie"   emale   NaN      1      2         W./C.6607  23.4500   NaN        S   Miss   miss
889          890         1     1.0                            Behr, Mr. Karl Howell    male  26.0      0      0            111369  30.0000  C148        C     Mr     mr
890          891         0     3.0                              Dooley, Mr. Patrick    male  32.0      0      0            370376   7.7500   NaN        Q     Mr     mr

CodePudding user response:

Assuming you wish to run your defined function on column "Name" & your data is loaded as df... Try this then;

df["Title"] = df["Name"].apply(lambda x: GetTitle(str(x)))

CodePudding user response:

You can use Series.str.split for this with alternative delimiters:

df = pd.DataFrame({'Name': ['Bar, Mr. Foo',
                            'Baz, Miss. Bar']})

df['Title'] = df.Name.str.split('. |, ', expand=True)[1].str.lower()

print(df)

             Name Title
0    Bar, Mr. Foo    mr
1  Baz, Miss. Bar  miss
  • Related