Home > front end >  How can I find a string inside a bs4.ResultSet (list) using Python?
How can I find a string inside a bs4.ResultSet (list) using Python?

Time:03-10

I'm trying to automate searching for ads in Facebook Ads Library. For that, I've used Selenium and BeautifulSoup to get the page's code.

The BeautifulSoup function returns a bs4.ResultSet with the page's HTML, which as I understand is a list.

I'm trying to loop through that list with soup.find_all, and for each element that is found, I want to test and see if there's a specific string in that.

But actually, my code isn't working as expected. The if statement inside the for loop always returns False.

# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete grátis aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = []
for tag in soup.find_all('div', class_='_99s5'):
    if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag):
        ads_list.append(tag)
    else:
        None

CodePudding user response:

The following statement:

if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag)

will return True if and only if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' is a substring of str(tag). I assume that you rather want to check whether str(tag) contains any of strings 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'. So it will be:

if any(e in str(tag) for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split()):

CodePudding user response:

As mentioned before, the strategy of using classes is not the best, as they can be very dynamic, so it would be better to stick to id, tag or perhaps text - but sometimes there may be no alternatives.

To select only the cards with a <span> containing the information that it has been used in ads, you can work with css selectors.

Following line will search for your outer <div> with class _99s5, that has a <span> containing your text and creates a ResultSet with these outer <div>:

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Example

Note: Language of your browser/driver should be englisch, else you have to change the text you expect to find.

driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete grátis aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Alternativ, not that happy about, but to give you an orientation would be to select the <div> with a direct child <span> containing your text and move up the structure with .parent:

ads_list = []

for tag in soup.select('div > span:-soup-contains("ads use this creative and text")'):
    ads_list.append(tag.parent.parent.parent.parent.parent.parent)
  • Related