I created this code in order to get real time amazon index market data from the platform PLUS 500:
import requests
import bs4
from lxml import etree
from Config import username, password
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Connection": "keep-alive"
}
with requests.session() as session:
url = 'https://app.plus500.com/trade/amazon'
page = requests.get(url, auth=(username, password), headers=headers)
try:
page.raise_for_status()
except Exception as exc:
print(f'Problem: {exc}')
html_page = bs4.BeautifulSoup(page.content, 'html.parser')
print(html_page.prettify())
dom = etree.HTML(str(html_page))
print(dom.xpath('//*[@id="_win_plus500_bind818"]')[0].text)
I created a dummy account for you in case it is necessary Username: [email protected] Password: MyRandomCode87
the problem I get back is this one
Traceback (most recent call last):
File "############", line 24, in <module>
print(dom.xpath('//*[@id="_win_plus500_bind818"]')[0].text)
IndexError: list index out of range
I am trying to scrape the sell and buy prices.
CodePudding user response:
The error you're getting is related to the indexing operation [0].text
.
Make sure the element you're trying to access exists. Also, make sure the @id
you're using doesn't change when you refresh the web page ("_win_plus500_bind818"
).
CodePudding user response:
The page you are trying to parse is dynamic, meaning the content is loaded after loading the page itself. To parse the dynamic page, you have to use selenium
, for example.
More details you can find in the following answer: How can I parse a dynamic page using Python?