I want to scrape terms from SAP Glossary website with terms details. I can only get 50 terms now. Because I couldn't figure out how to click on 'load more' then continue scrolling down to scrape more terms. I noticed the 'load more' button has to change color to orange so it's clickable
page_url = "https://help.sap.com/glossary/?locale=en-US&search=CRM"
driver.get(page_url)
driver.maximize_window()
element = driver.find_elements(by=By.XPATH,value='//a[@role="menuitem"]')
load_more = driver.find_elements(by=By.CSS_SELECTOR,value='button.motion-button')
detail = []
c = driver.find_elements(by=By.TAG_NAME,value='p')
for i in range(51):
element[i].click()
detail.append(c[0].text)
print(i,c[0].text)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
CodePudding user response:
I found this video talks exactly what I need. It's not about the 'Load more' button...you need to find the json file https://www.youtube.com/watch?v=qqNufBruvUc
CodePudding user response:
I wrote the following code can meet your requirement, first while the 'load more' button is existed, click it to load more data. after all data loaded. then use 'find_elements' to get the element collection.
from time import sleep
from clicknium import clicknium as cc
if not cc.chrome.extension.is_installed():
cc.chrome.extension.install_or_update()
tab = cc.chrome.open("https://help.sap.com/glossary/?locale=en-US&search=CRM")
load_more = tab.find_element_by_css_selector('button.motion-button')
while tab.is_existing_by_css_selector('button.motion-button'):
load_more.click()
sleep(1)
elements = tab.find_elements_by_xpath('//a[@role="menuitem"]')
for element in elements:
element.click()
print(element.get_text())