I've used BeautifulSoup to find a specific div class in the page's HTML. I want to check if this div has a span class inside it. If the div has the span class, I want to maintain it on the page's code, but if it doesn't, I want to delete it, maybe using selenium.
For that I have two lists selecting the elements (div and span). I tried to check if one list is inside the other, and that kind of worked. But how can one delete that found element from the page's source code?
# Web page url request
driver.get('https://www.facebook.com/ads/library/?active_status=all&ad_type=all&country=BR&q=frete grátis aproveite&sort_data[direction]=desc&sort_data[mode]=relevancy_monthly_grouped&search_type=keyword_unordered&media_type=all')
driver.maximize_window()
time.sleep(10)
# Webscraping with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Option 1
element = driver.find_element_by_class_name('div._99s5:has(:not(:-soup-contains("ads use this creative and text")))')
driver.execute_script("""
var element = arguments[0];
element.parentNode.removeChild(element);
""", element)
Option 1 returns:
selenium.common.exceptions.InvalidSelectorException: Message: >invalid selector: An invalid or illegal selector was specified
# Option 2
for e in soup.select('div._99s5:has(:not(:-soup-contains("ads use this creative and text")))'):
driver.execute_script("""
var e = arguments[0];
element.parentNode.removeChild(e);
""", e)
Option 2 returns:
TypeError: Object of type Tag is not JSON serializable
CodePudding user response:
Note: Question and comments reads a bit confusing so it would be great to improve it a bit. Assuming you like to decompose()
some elements, the reason why or what to do after this action is not clear. So this answer will only point out an apporache.
To decompose()
the elements that do not contains ads use this creative and text
just negate your selection and iterate the ResultSet
:
for e in soup.select('div._99s5:has(:not(:-soup-contains("ads use this creative and text")))'):
e.decompose()
Now these elements will no longer be included in your soup
and you could process it for your needs.
Another note - Be aware you have to implement scrolling to get more / all off these elements with class _99s5
.
CodePudding user response:
Since you're deleting them in javascript anyway:
driver.execute_script("""
for(let div of document.querySelectorAll('div._99s5')){
let match = div.innerText.match(/(\d ) ads? use this creative and text/)
let numAds = match ? parseInt(match[1]) : 0
if(numAds < 10){
div.querySelector(".tp-logo")?.remove()
}
}
""")