I have built a dataframe that extracts data through a scraper. I extracted job positions, and currently, this column contains job positions as follows:
Title Research Number \
1 Dean NaN
2 Professor of Law NaN
3 Associate Dean for Information & Technology Se... NaN
4 Professor of Law\n NaN
5 Associate Dean for Faculty Development\nCharle... NaN
6 Associate Dean for Faculty Development\nCharle... NaN
7 Assistant Professor of Clinical Education & Di... NaN
8 Judge George Howard, Jr., Distinguished Profes... NaN
9 Visiting Assistant Professor of Law NaN
10 Associate Dean for Academic Affairs\nArkansas ... NaN
11 Distinguished Professor in Constitutional Law NaN
12 Assistant Professor of Law NaN
13 Instructor of Clinical Education; Supervising ... NaN
14 Associate Professor of Law NaN
15 Assistant Professor of Law\n NaN
16 Assistant Professor of Clinical Education; Tax... NaN
17 Assistant Professor of Law Librarianship; NaN
18 Byron M. Eiseman Distinguished Professor of Ta... NaN
19 Professor of Law\n NaN
20 Associate Professor of Law; Mediation Clinic D... NaN
21 Assistant Professor of Clinical Education; Fam... NaN
22 Assistant Professor of Clinical Education; Co... NaN
23 Associate Professor of Law\n NaN
24 Professor of Law Librarianship; Electronic Res... NaN
25 Professor of Law\n NaN
26 Professor of Law\n NaN
27 Associate Dean for Experiential Learning & Cli... NaN
28 Associate Professor of Law\n NaN
29 Assistant Professor of Clinical Education; Bus... NaN
30 Associate Professor of Law Librarianship; NaN
I would like to replace these titles with the following titles:
titles=["Adjunct Professor","Professor Emeritus","Associate Professor","Assistant Professor","Professor"]
How can I look for partial text and replace it? I don't want to fully replace the text if it's not a 100% match. For example 'Visiting Assistant Professor of Law' should be replaced with 'Assistant Professor'
Thank you!
CodePudding user response:
Use str.extract
:
df['Title2'] = df['Title'].str.extract(f'({"|".join(titles)})')
output:
Title
1 NaN
2 Professor
3 NaN
...
29 Assistant Professor
30 Associate Professor
If you want to keep the original Title in case of no match, use:
df['Title'] = df['Title'].str.extract(f'({"|".join(titles)})', expand=False).fillna(df['Title'])