I am currently trying to scrape news headlines with python, beautiful soup and selenium from a website with a "show more" button. I am able to successfully load the page with selenium, click the button to bring up more headlines, and then print out the headlines, all with no error messages. My problem is Beautiful Soup is not reading the contents of the driver after the "show more" button is clicked. It is only reading the headlines that are initially on the page before the button is clicked. How do I make it so the the headlines are read and printed out only after the "show more" button is click a certain number of times? I have a for loop rather than a while loop to so I can click the button n times.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
s=Service('/Users/comp/Desktop/chromedriver')
driver= webdriver.Chrome(service=s)
url='https://www.foxnews.com/politics'
driver.get(url)
for x in range(10):
try:
loadMoreButton = driver.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[2]/div/main/section[4]/footer/div/a")
time.sleep(3)
loadMoreButton.click()
time.sleep(3)
except Exception as e:
print(e)
break
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'lxml')
headlines = soup.find('body').find_all('h4')
for x in headlines:
print(x.text.strip())
time.sleep(3)
driver.quit()
CodePudding user response:
Try the below code. Now it's working
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
s=Service('./chromedriver')
driver= webdriver.Chrome(service=s)
url='https://www.foxnews.com/politics'
driver.get(url)
time.sleep(3)
for x in range(10):
try:
soup = BeautifulSoup(driver.page_source, 'lxml')
headlines = soup.find('body').find_all('h4')
for x in headlines:
print(x.text.strip())
loadMoreButton = driver.find_element(By.XPATH, "/html/body/div[2]/div/div/div/div[2]/div/main/section[4]/footer/div/a")
if loadMoreButton:
loadMoreButton.click()
time.sleep(3)
except Exception as e:
print(e)
break