im trying to create loop with selenium autochains on this website, because im unable to locate the paginate button(got error "JavascriptException" "Message: javascript error: Cannot read properties of undefined
) hence im searching another alternative to using autochains.
here is my code.
for i in range(1, 15):
root1 = driver.find_element(By.XPATH, '//*[@id="topic-list"]/card-topic[{}]'.format(i))
shadow_root = expand_shadow_element(root1)
text = shadow_root.find_element(By.CSS_SELECTOR, 'a').click()
time.sleep(2)
qt = tittle_root.find_element(By.CSS_SELECTOR, 'user-topic')
qt_root = qt.shadow_root
qt_ele = qt_root.find_element(By.CSS_SELECTOR, 'p')
qt_text = qt_ele.text
question_data.append(qt_text)
time.sleep(2)
driver.back()
time.sleep(2)
paginate = driver.find_element(By.CSS_SELECTOR, 'paginate-button')
paginate_root = expand_shadow_element(paginate)
paginate2 = paginate_root.find_element(By.LINK_TEXT, 'Selanjutnya')
actions = ActionChains(driver)
actions.move_to_element(paginate2)
actions.click(paginate2)
actions.perform()
the problems i got is, yes the autochains able to move to the second page, but unable to scrape the page.
when on the first default page, they just scrape the first question. next page, they just scrape the second question. and so on. is there any suggestion on how to fix my code? fyi, the first loop i use, is to get full question on the website. so i can get all the 15 question.
any help would be appreciated thank you.
CodePudding user response:
I think this is what you wanted:
# Needed libs
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://www.alodokter.com/komunitas/diskusi/penyakit"
driver.get(url)
driver.maximize_window()
# Loop for every page we want, in this case 10
for i in range(0, 10):
# We check how many topics we have in this page
how_many_topics = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="topic-list"]/card-topic')))
# Loop for every topic
for i in range(1, len(how_many_topics)):
# We click on the topic link
topic = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, f'(//*[@id="topic-list"]/card-topic)[{i}]')))
shadow1 = driver.execute_script("return arguments[0].shadowRoot", topic)
shadow1.find_element(By.CSS_SELECTOR, 'a').click()
# We get the details we want, in my example only the title, but you can take whatever you want
detail_topic = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'detail-topic')))
shadow2 = driver.execute_script("return arguments[0].shadowRoot", detail_topic)
title = shadow2.find_element(By.CSS_SELECTOR, '.h2').text
print(title)
# We click on back button
driver.back()
# We search for the next button and we click on it
pagination = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, f'//paginate-button')))
shadow_pagination = driver.execute_script("return arguments[0].shadowRoot", pagination)
next_button = shadow_pagination.find_element(By.CSS_SELECTOR, 'a.page-next')
actions = ActionChains(driver)
actions.move_to_element(next_button).click(next_button).perform()
Your problem is that you have only one loop, you need 2, one for the pages and another one for topics.
- In the loop of pages you count the topics and click on next
- In the loop of topics you click on the topics and take the info and click on back button
CodePudding user response:
when using driver.back()
Python gets crush. Use driver.execute_script("window.history.go(-1)")
instead.