How can I find a string inside a bs4.ResultSet (list) using Python?-CodePudding

I'm trying to automate searching for ads in Facebook Ads Library. For that, I've used Selenium and BeautifulSoup to get the page's code.

The BeautifulSoup function returns a bs4.ResultSet with the page's HTML, which as I understand is a list.

I'm trying to loop through that list with soup.find_all, and for each element that is found, I want to test and see if there's a specific string in that.

But actually, my code isn't working as expected. The if statement inside the for loop always returns False.

# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete grátis aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = []
for tag in soup.find_all('div', class_='_99s5'):
    if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag):
        ads_list.append(tag)
    else:
        None

CodePudding user response：

The following statement:

if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag)

will return True if and only if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' is a substring of str(tag). I assume that you rather want to check whether str(tag) contains any of strings 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'. So it will be:

if any(e in str(tag) for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split()):

CodePudding user response：

As mentioned before, the strategy of using classes is not the best, as they can be very dynamic, so it would be better to stick to id, tag or perhaps text - but sometimes there may be no alternatives.

To select only the cards with a <span> containing the information that it has been used in ads, you can work with css selectors.

Following line will search for your outer <div> with class _99s5, that has a <span> containing your text and creates a ResultSet with these outer <div>:

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Example

Note: Language of your browser/driver should be englisch, else you have to change the text you expect to find.

driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete grátis aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)

# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser') 

ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')

Alternativ, not that happy about, but to give you an orientation would be to select the <div> with a direct child <span> containing your text and move up the structure with .parent:

ads_list = []

for tag in soup.select('div > span:-soup-contains("ads use this creative and text")'):
    ads_list.append(tag.parent.parent.parent.parent.parent.parent)