how to create loop with autochain selenium?-CodePudding

im trying to create loop with selenium autochains on this website, because im unable to locate the paginate button(got error "JavascriptException" "Message: javascript error: Cannot read properties of undefined) hence im searching another alternative to using autochains.

here is my code.

for i in range(1, 15):

    root1 = driver.find_element(By.XPATH, '//*[@id="topic-list"]/card-topic[{}]'.format(i))
    shadow_root = expand_shadow_element(root1)
    text = shadow_root.find_element(By.CSS_SELECTOR, 'a').click() 
    time.sleep(2)

    qt = tittle_root.find_element(By.CSS_SELECTOR, 'user-topic')
    qt_root = qt.shadow_root
    qt_ele = qt_root.find_element(By.CSS_SELECTOR, 'p')
    qt_text = qt_ele.text
    question_data.append(qt_text)

    time.sleep(2)
    driver.back()
    time.sleep(2)

    paginate = driver.find_element(By.CSS_SELECTOR, 'paginate-button')
    paginate_root = expand_shadow_element(paginate)
    paginate2 = paginate_root.find_element(By.LINK_TEXT, 'Selanjutnya')
    actions = ActionChains(driver)
    actions.move_to_element(paginate2)
    actions.click(paginate2)
    actions.perform()

the problems i got is, yes the autochains able to move to the second page, but unable to scrape the page.

when on the first default page, they just scrape the first question. next page, they just scrape the second question. and so on. is there any suggestion on how to fix my code? fyi, the first loop i use, is to get full question on the website. so i can get all the 15 question.

any help would be appreciated thank you.

CodePudding user response：

I think this is what you wanted:

# Needed libs
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver

driver = webdriver.Chrome()
url = "https://www.alodokter.com/komunitas/diskusi/penyakit"

driver.get(url)
driver.maximize_window()

# Loop for every page we want, in this case 10
for i in range(0, 10):
    # We check how many topics we have in this page
    how_many_topics = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="topic-list"]/card-topic')))
    # Loop for every topic
    for i in range(1, len(how_many_topics)):
        # We click on the topic link
        topic = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, f'(//*[@id="topic-list"]/card-topic)[{i}]')))
        shadow1 = driver.execute_script("return arguments[0].shadowRoot", topic)
        shadow1.find_element(By.CSS_SELECTOR, 'a').click()

        # We get the details we want, in my example only the title, but you can take whatever you want
        detail_topic = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'detail-topic')))
        shadow2 = driver.execute_script("return arguments[0].shadowRoot", detail_topic)
        title = shadow2.find_element(By.CSS_SELECTOR, '.h2').text
        print(title)

        # We click on back button
        driver.back()

    # We search for the next button and we click on it
    pagination = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, f'//paginate-button')))
    shadow_pagination = driver.execute_script("return arguments[0].shadowRoot", pagination)
    next_button = shadow_pagination.find_element(By.CSS_SELECTOR, 'a.page-next')
    actions = ActionChains(driver)
    actions.move_to_element(next_button).click(next_button).perform()

Your problem is that you have only one loop, you need 2, one for the pages and another one for topics.

In the loop of pages you count the topics and click on next
In the loop of topics you click on the topics and take the info and click on back button

CodePudding user response：

when using driver.back()Python gets crush. Use driver.execute_script("window.history.go(-1)") instead.