I'm trying to automate searching for ads in Facebook Ads Library. For that, I've used Selenium and BeautifulSoup to get the page's code.
The BeautifulSoup function returns a bs4.ResultSet with the page's HTML, which as I understand is a list.
I'm trying to loop through that list with soup.find_all, and for each element that is found, I want to test and see if there's a specific string in that.
But actually, my code isn't working as expected. The if statement inside the for loop always returns False.
# Using chrome driver
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete grátis aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)
# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
ads_list = []
for tag in soup.find_all('div', class_='_99s5'):
if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag):
ads_list.append(tag)
else:
None
CodePudding user response:
The following statement:
if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89' in str(tag)
will return True
if and only if 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'
is a substring of str(tag)
. I assume that you rather want to check whether str(tag)
contains any of strings 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'
. So it will be:
if any(e in str(tag) for e in 'qku1pbnj j8otv06s r05nras9 a1itoznt te7ihjl9 svz86pwt a53abz89'.split()):
CodePudding user response:
As mentioned before, the strategy of using classes is not the best, as they can be very dynamic, so it would be better to stick to id, tag or perhaps text - but sometimes there may be no alternatives.
To select only the cards with a <span>
containing the information that it has been used in ads, you can work with css selectors
.
Following line will search for your outer <div>
with class _99s5
, that has a <span>
containing your text and creates a ResultSet
with these outer <div>
:
ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')
Example
Note: Language of your browser/driver should be englisch, else you have to change the text you expect to find.
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete grátis aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)
# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
ads_list = soup.select('div._99s5:has(:-soup-contains("ads use this creative and text"))')
Alternativ, not that happy about, but to give you an orientation would be to select the <div>
with a direct child <span>
containing your text and move up the structure with .parent
:
ads_list = []
for tag in soup.select('div > span:-soup-contains("ads use this creative and text")'):
ads_list.append(tag.parent.parent.parent.parent.parent.parent)