I've used the SerpAPI to pull down some data about jobs in a sector I want to return to.
There is a lot of junk about training and I'd like to remove the results based on the displayed_link column.
position title link displayed_link date snippet snippet_highlighted_words sitelinks about_this_result about_page_link about_page_serpapi_link cached_page_link related_questions rich_snippet related_pages_link thumbnail duration key_moments
0 1 What Does a Data Analyst Do? Your 2022 Career ... https://www.coursera.org/articles/what-does-a-... https://www.coursera.org › Coursera Articles ›... Nov 14, 2022 A data analyst is a person whose job is to gat... [data analyst] {'inline': [{'title': 'Business analyst', 'lin... {'source': {'description': 'Coursera Inc. is a... https://www.google.com/search?q=About https://... https://serpapi.com/search.json?engine=google_... https://webcache.googleusercontent.com/search?... NaN NaN NaN NaN NaN NaN
1 2 What Does a Data Analyst Do? Exploring the Day... https://www.rasmussen.edu/degrees/technology/b... https://www.rasmussen.edu › degrees › technolo... Sep 19, 2022 Generally speaking, a data analyst will retrie... [data analyst, Data analysts] {'inline': [{'title': 'Where Do Data Analysts ... {'source': {'description': 'Rasmussen Universi... https://www.google.com/search?q=About https://... https://serpapi.com/search.json?engine=google_... https://webcache.googleusercontent.com/search?... NaN NaN NaN NaN NaN NaN
2 3 Become a Data Analyst Learning Path - LinkedIn https://www.linkedin.com/learning/paths/become... https://www.linkedin.com › learning › become-a... NaN Data analysts examine information using data a... [Data analysts, data analysis] NaN {'source': {'description': 'LinkedIn is an Ame... https://www.google.com/search?q=About https://... https://serpapi.com/search.json?engine=google_... NaN NaN NaN NaN NaN NaN NaN
3 4 What Does a Data Analyst Do? - SNHU https://www.snhu.edu/about-us/newsroom/stem/wh... https://www.snhu.edu › about-us › newsroom › stem
Tried manually creating of the sites I want to exclude sites in this list
promotions = ["coursera"
,"rasmussen"
,"snhu"
,"mastersindatascience"
,"northeastern"
,"mygreatlearning"
,"payscale.com"
,"careerfoundry"
,"microsoft.com"
,"codecademy"
,"edx.org"
,"ahima.org"
,"›certification-exams›chda'"]
Tried this:
df['displayed_link'].map(lambda x: "T" if x in promotions else "F")
And all it does is return F - I'm guessing because it needs exact string.
df['displayed_link'].map(lambda x: "T" if promotions in x else "F")
I tried it the other way, but that was a syntax error.
What is the most efficient way of filtering rows based on a column based on a list of manually curated strings?
enter code here
CodePudding user response:
Use Series.str.contains
with chain list by |
for regex OR:
df['test1'] = np.where(df['displayed_link'].str.contains('|'.join(promotions)), 'T', 'F')
df['test2'] = (df['displayed_link'].str.contains('|'.join(promotions))
.map({True:'T',False: 'F'}))
If necessary, use words boundaries \b\b
:
pat = '|'.join(rf"\b{x}\b" for x in promotions))
df['test3']= np.where(df['displayed_link'].str.contains(pat), 'T', 'F')
df['test4']= df['displayed_link'].str.contains(pat).map({True:'T',False: 'F'})
print (df)
displayed_link test1 test2 test3 test4
0 https://www.coursera.org/articles/what-does-a T T T T
1 https://www.rasmussen.edu/degrees/technology/ T T T T
2 https://www.linkedin.com/learning/paths/ F F F F
3 https://www.snhu1.edu/about-us/newsroom/stem/ T T F F