Home > Software engineering >  Web page is closed even before scraping data using selenium python
Web page is closed even before scraping data using selenium python

Time:10-01

I am trying to scrape the news headlines from here. The program starts its execution, however without extracting the news headlines, the program ends. Here is my function to scrape it news urls headlines from the website

def dvm():
    print('--dvm news 360--')
    url = 'https://www.dvm360.com/news'
    browser.get(url)
    time.sleep(20)

    headlines = browser.find_elements_by_class_name('title')
    url_headlines = [ele.find_element_by_tag_name('a').get_attribute('href') for ele in headlines]
    print(url_headlines)
    browser.close()
    browser.quit()

And here is the content it prints on my terminal

--dvm news 360--
[15036:4376:0930/121207.716:ERROR:chrome_browser_main_extra_parts_metrics.cc(228)] crbug.com/1216328: Checking Bluetooth availability started. Please report if there is no report that thi
s ends.
[15036:7848:0930/121207.719:ERROR:device_event_log_impl.cc(214)] [12:12:07.719] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to the
 system is not functioning. (0x1F)
[15036:4376:0930/121207.720:ERROR:chrome_browser_main_extra_parts_metrics.cc(231)] crbug.com/1216328: Checking Bluetooth availability ended.
[15036:4376:0930/121208.338:ERROR:chrome_browser_main_extra_parts_metrics.cc(234)] crbug.com/1216328: Checking default browser status started. Please report if there is no report that thi
s ends.
[15036:4376:0930/121209.747:ERROR:chrome_browser_main_extra_parts_metrics.cc(238)] crbug.com/1216328: Checking default browser status ended.
[]

What exactly is going wrong here? Please help me understand. Thanks!

EDIT: The output that I get after using @cruisepandey's answer is as follows

--dvm news 360--
[13608:1192:0930/141223.693:ERROR:chrome_browser_main_extra_parts_metrics.cc(228)] crbug.com/1216328: Checking Bluetooth availability started. Please report if there is no report that thi
s ends.
[13608:1192:0930/141223.694:ERROR:chrome_browser_main_extra_parts_metrics.cc(231)] crbug.com/1216328: Checking Bluetooth availability ended.
[13608:1192:0930/141223.695:ERROR:chrome_browser_main_extra_parts_metrics.cc(234)] crbug.com/1216328: Checking default browser status started. Please report if there is no report that thi
s ends.
[13608:12140:0930/141223.700:ERROR:device_event_log_impl.cc(214)] [14:12:23.700] USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: A device attached to th
e system is not functioning. (0x1F)
[13608:1192:0930/141223.827:ERROR:chrome_browser_main_extra_parts_metrics.cc(238)] crbug.com/1216328: Checking default browser status ended.
Morris Animal Foundation announces new equine and animal welfare advisory members
Understanding Cushing syndrome and cortisol: From lab work to rechecks
A call for change: Addressing the lack of diversity in veterinary schools and beyond
Butterfly Network and AVG collaborate to provide breakthrough ultrasound to UrgentVet clinics
3 Methods for battling burnout in veterinary medicine
Dechra acquires veterinary marketing and distributions rights for Equine ProVet APC
something went wrong

CodePudding user response:

I tried to scrape the web app with infinite loop and with xpath indexing. It did not close the browser at all. The website is dynamic in nature, the more you scroll the more content you will get.

Xpath used //[contains(@class,'title') and not(@style)]

Code :

driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)
driver.get("https://www.dvm360.com/news")

try:
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div[title='Close']"))).click()
except:
    pass

j = 1
try:
    while True:
        element  = driver.find_element(By.XPATH, f"(//*[contains(@class,'title') and not(@style)])[{j}]")
        driver.execute_script("arguments[0].scrollIntoView(true);", element)
        print(element.get_attribute('innerText'))
        j = j   1
except:
    print("something went wrong")
    pass

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Output :

Morris Animal Foundation announces new equine and animal welfare advisory members
Understanding Cushing syndrome and cortisol: From lab work to rechecks
A call for change: Addressing the lack of diversity in veterinary schools and beyond
Butterfly Network and AVG collaborate to provide breakthrough ultrasound to UrgentVet clinics
3 Methods for battling burnout in veterinary medicine
Dechra acquires veterinary marketing and distributions rights for Equine ProVet APC
Product comparison in an unregulated industry: Hemp
This week on dvm360: 3 ways to combat burnout, plus more veterinary news
Vet’s Best Friend emphasizes mental well-being with company-wide yoga series
How the 'oxygen mask rule' can help combat stress
News-wrap up: This week's veterinary news, plus how Walt Disney would run a veterinary clinic
The Dilemma: Promoting a secure, comfortable workplace
Episode 64: How laughter can rekindle your love for veterinary medicine
NAVC to host first inaugural veterinary nurse summit
AAFCO urges additional research on animal food hemp products
IVPA praises Michigan Supreme Court ruling pertaining to licensed veterinarians
The importance of dental radiology
Dermatologic cytology 101: Tips for collecting samples
10 Things to consider when selling a veterinary practice
Lions and tigers at Smithsonian National Zoo test presumptive positive for COVID-19
3 must-reads on canine IBD
Curb canine anxiety with Calmer Canine
AVMA unveils new “Language of Veterinary Care” online tools and breakthrough research
Subtle signs indicative of an orthopedic issue
This week on dvm360: Calmer Canine for anxious dogs detailed, plus more veterinary news
3 Must-reads on veterinary practice management
Vets Pets unveiling brand-new hospital in Wendell, North Carolina
FDA’s CVM determines TriviumVet’s feline HCM program eligible for expanded conditional approval pathway
Exploring the role of omega-3 supplementation in cats and dogs
News-wrap up: This week's veterinary headlines, plus an update on Australian wildlife impacted by 2019-2020 wildfires
Episode 63: A day in the life of a veterinary toxicologic pathologist
Why mindfulness matters
Stepping up to the plate: Australia works to restore wildlife impacted by wildfires
Vet’s Best Friend to host in-person mental health discussion
Canine Osteosarcoma: Teaching an Old Dog New Tricks
Chewy launches brand-new marketplace service for veterinarians
Animal Medical Center appoints inaugural chief veterinary technician
dvm360 product report: Canine mast cell tumor treatment, plus a pet food line and more
Honoring National Hispanic Heritage Month
Work-life balance versus work integration: What's the difference?
AVMA supports legislation surrounding xylitol warning label requirements
Pet TravelPass introduced to streamline international travel documentation process
Anifera granted seed funding from the CIEL for research
Veterinary cannabis for pain management: The endocannabinoidome
How children can help inspire a more diverse future for veterinary medicine
Antinol partners with dog rescue advocate Lee Asher
3 Must-reads on equine medicine
No more 'Woe is me'
Steering towards antimicrobial stewardship
This week on dvm360: An update on veterinary cannabis for pain management, plus more veterinary news
Resolving itchy skin: Here's what you need to know
Separation anxiety in dogs may affect quality of life
Plugging the protein faucet in dogs with PLE
Image quiz: Liver troubles in a Lab
Sinoscopy in the standing horse
News wrap up: This week's veterinary headlines, plus new advancements in canine cancer testing
Episode 62: Investing in your team is investing in your business
Your food allergy questions answered
Second Basepaws Cat Behavior Summit will deliver leading veterinary expertise into the homes of cat parents
When raising prices is detrimental
Anivive Lifesciences strives to develop 'valley fever' prevention vaccine
Trupanion partners with MCVMA to support veterinary professionals
Student debt: What’s the problem?
PEMCO teams up with Pets Best Pet Insurance to offer pet coverage
Advancements in testing for cancer in canines
Life as a Black veterinary student

As you can see there are more output, I can not really paste everything here.

In case you wanna set a limit to the article you are scrapping that can be done with the help of if condition, Please see below.

j = 1
try:
    while True:
        element  = driver.find_element(By.XPATH, f"(//*[contains(@class,'title') and not(@style)])[{j}]")
        driver.execute_script("arguments[0].scrollIntoView(true);", element)
        print(element.get_attribute('innerText'))
        if j ==10:
            break
        j = j   1
except:
    print("something went wrong")
    pass
  • Related