I am using the following code to attempt to keep clicking a "Load More" button until all page results are shown on the website:
from selenium import webdriver
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def startWebDriver():
global driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--window-size=1920x1080")
driver = webdriver.Chrome(options = chrome_options)
startWebDriver()
driver.get("https://together.bunq.com/all")
time.sleep(4)
while True:
try:
wait = WebDriverWait(driver, 10,10)
element = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "[title='Load More']")))
element.click()
print("Loading more page results")
except:
print("All page results displayed")
break;
However, since the button click does not change the URL, no new data is loaded into chromedriver and the while loop will break on the second iteration.
CodePudding user response:
Selenium is overkill for this. You only need requests
. Logging one's network traffic reveals that at some point JavaScript makes an XHR HTTP GET request to a REST API endpoint, the response of which is JSON and contains all the information you're likely to want to scrape.
One of the query-string parameters for that endpoint URL is page[offset]
, which is used to offset the query results for pagination (in this case the "load more button"). A value of 0
corresponds to no offset, or "start at the beginning". Increment this value to suit your needs - in a loop would probably be a good place to do this.
Simply imitate that XHR HTTP GET request - copy the API endpoint URL and query-string parameters and request headers, then parse the JSON response:
def get_discussions():
import requests
url = "https://together.bunq.com/api/discussions"
params = {
"include": "user,lastPostedUser,tags,firstPost",
"page[offset]": 0
}
headers = {
"user-agent": "Mozilla/5.0"
}
response = requests.get(url, params=params, headers=headers)
response.raise_for_status()
yield from response.json()["data"]
def main():
for discussion in get_discussions():
print(discussion["attributes"]["title"])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
⚡️What’s new in App Update 18.8.0
Local Currencies Accounts Fees
Local Currencies