Home > Back-end >  Not all html can be accessed in pythion-requests-html
Not all html can be accessed in pythion-requests-html

Time:02-20

I am trying to run a script to simply find a few numbers in a website however it doesn't seem to want to let me past a certain point. In this script :

from requests_html import HTMLSession
import requests

url = "https://auction.chimpers.xyz/"
try:
    s = HTMLSession()
    r = s.get(url)
except requests.exceptions.RequestException as e:
    print(e)

r.html.render(sleep=1)

title = r.html.find("title",first=True).text
print(title)

divs_found = r.html.find("div")
print(divs_found)

meta_desc = r.html.xpath('//*[@id="description-view"]/div',first=True)
print(meta_desc)

price = r.html.find(".m-complete-info div",first=True)
print(price)

The result of this gives :
Chimpers Genesis 100
[<Element 'div' id='app'>, <Element 'div' data-v-1d311e85='' id='m-connection' class=('manifold',)>, <Element 'div' id='description-view'>, <Element 'div' class=('manifold', 'm-complete-view')>, <Element 'div' data-v-cf8dbfe2='' class=('manifold', 'loading-screen')>, <Element 'div' class=('manifold-logo',)>] <Element 'div' class=('manifold', 'm-complete-view')>
None
[Finished in 3.9s]

website : https://auction.chimpers.xyz/

and the information I am trying to find is here

clearly there is more html elements past the ones in the printed out in the list, however every time I try and access them even using r.html.xpath("//*[@id="description-view"]/div/div[2]/div/div[2]/span/span1") it will return None even though it is the copied xpath that i have got via the inspect in google

Any reason why this is and how i would go about it?

CodePudding user response:

I don't actually if it's even possible to do with requests_html, but it is with selenium.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

url = "https://auction.chimpers.xyz/"
class_names = ["m-price-label", "m-price-data"]

driver_options = Options()
driver_options.add_argument("--headless")
driver = webdriver.Chrome(options=driver_options)
driver.get(url)

results = {}

try:
    for class_name in class_names:
        element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, class_name)))
        # Getting inner text of the html tag
        results[class_name] = element.get_attribute("textContent")
finally:
    driver.quit()

print(results)

Feel free to use another webdriver than Chrome

  • Related