Home > Back-end >  IndexError by using Xpath for scraping
IndexError by using Xpath for scraping

Time:08-29

I created this code in order to get real time amazon index market data from the platform PLUS 500:


import requests
import bs4
from lxml import etree
from Config import username, password

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Connection": "keep-alive"
}

with requests.session() as session:
    url = 'https://app.plus500.com/trade/amazon'
    page = requests.get(url, auth=(username, password), headers=headers)
    try:
        page.raise_for_status()
    except Exception as exc:
        print(f'Problem: {exc}')
    html_page = bs4.BeautifulSoup(page.content, 'html.parser')
    print(html_page.prettify())
    dom = etree.HTML(str(html_page))
    print(dom.xpath('//*[@id="_win_plus500_bind818"]')[0].text)

I created a dummy account for you in case it is necessary Username: [email protected] Password: MyRandomCode87

the problem I get back is this one


Traceback (most recent call last):
  File "############", line 24, in <module>
    print(dom.xpath('//*[@id="_win_plus500_bind818"]')[0].text)
IndexError: list index out of range

I am trying to scrape the sell and buy prices.

CodePudding user response:

The error you're getting is related to the indexing operation [0].text.

Make sure the element you're trying to access exists. Also, make sure the @id you're using doesn't change when you refresh the web page ("_win_plus500_bind818").

CodePudding user response:

The page you are trying to parse is dynamic, meaning the content is loaded after loading the page itself. To parse the dynamic page, you have to use selenium, for example.

More details you can find in the following answer: How can I parse a dynamic page using Python?

  • Related