Home > Blockchain >  Python replacing partial matching text based on a list of elements in data frame
Python replacing partial matching text based on a list of elements in data frame

Time:03-02

I have built a dataframe that extracts data through a scraper. I extracted job positions, and currently, this column contains job positions as follows:

                                                Title Research Number  \
1                                                Dean             NaN   
2                                    Professor of Law             NaN   
3   Associate Dean for Information & Technology Se...             NaN   
4                                  Professor of Law\n             NaN   
5   Associate Dean for Faculty Development\nCharle...             NaN   
6   Associate Dean for Faculty Development\nCharle...             NaN   
7   Assistant Professor of Clinical Education & Di...             NaN   
8   Judge George Howard, Jr., Distinguished Profes...             NaN   
9                 Visiting Assistant Professor of Law             NaN   
10  Associate Dean for Academic Affairs\nArkansas ...             NaN   
11      Distinguished Professor in Constitutional Law             NaN   
12                         Assistant Professor of Law             NaN   
13  Instructor of Clinical Education; Supervising ...             NaN   
14                         Associate Professor of Law             NaN   
15                       Assistant Professor of Law\n             NaN   
16  Assistant Professor of Clinical Education; Tax...             NaN   
17         Assistant Professor of Law Librarianship;              NaN   
18  Byron M. Eiseman Distinguished Professor of Ta...             NaN   
19                                 Professor of Law\n             NaN   
20  Associate Professor of Law; Mediation Clinic D...             NaN   
21  Assistant Professor of Clinical Education; Fam...             NaN   
22   Assistant Professor of Clinical Education; Co...             NaN   
23                       Associate Professor of Law\n             NaN   
24  Professor of Law Librarianship; Electronic Res...             NaN   
25                                 Professor of Law\n             NaN   
26                                 Professor of Law\n             NaN   
27  Associate Dean for Experiential Learning & Cli...             NaN   
28                       Associate Professor of Law\n             NaN   
29  Assistant Professor of Clinical Education; Bus...             NaN   
30         Associate Professor of Law Librarianship;              NaN 

I would like to replace these titles with the following titles:

titles=["Adjunct Professor","Professor Emeritus","Associate Professor","Assistant Professor","Professor"]

How can I look for partial text and replace it? I don't want to fully replace the text if it's not a 100% match. For example 'Visiting Assistant Professor of Law' should be replaced with 'Assistant Professor'

Thank you!

CodePudding user response:

Use str.extract:

df['Title2'] = df['Title'].str.extract(f'({"|".join(titles)})')

output:

                  Title
1                   NaN
2             Professor
3                   NaN
...
29  Assistant Professor
30  Associate Professor

If you want to keep the original Title in case of no match, use:

df['Title'] = df['Title'].str.extract(f'({"|".join(titles)})', expand=False).fillna(df['Title'])
  • Related