Beautiful Soup Web Scraper IndexError: list index out of range-CodePudding

I am making a Web scraper that scrapes Yahoo Finance and tells me what the current stock price is.

I keep getting an error like this after running the program

IndexError: list index out of range

this is the code

def parsePrice():
r=requests.get('https://finance.yahoo.com/quote/F?p=F')
soup=bs4.BeautifulSoup(r.text,'xml')
#the next line is the supposed problem
price=soup.find_all('div',{'class': 'My(6px) Pos(r) smartphone_Mt(6px)'})[0].Find('span').text
return price




while True:
    print('the current price is: ' str(parsePrice()))

I am a beginner into python so any help would be appreciated :)

CodePudding user response：

What happens?

Note Always look at your soup first - therein lies the truth. The content can always be slightly to extremely different from the view in the dev tools.

There is no <div> with such a class your searching for in the soup and that is why the resultset is empty and could not match on picking index [0]

How to fix?

Add some headers to your request, to show up you might be a "browser":

headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'}

Select your element more specific - Cause you know the data symbol from your request you can select it directly:
```
soup.select_one('[data-symbol="F"]')['value']
```

Example

Note First rule of scraping: do not harm the website! Means that the volume and frequency of queries you make should not burden the websites, servers. So please, add some delay (import time -> time.sleep(60)) between your requests or use an offical api

import bs4
import requests
headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}


def parsePrice():
    r=requests.get('https://finance.yahoo.com/quote/F?p=F', headers=headers)
    soup=bs4.BeautifulSoup(r.text,'xml')
    price = soup.select_one('[data-symbol="F"]')['value']
    return price

while True:
    print('the current price is: ' str(parsePrice()))

Output

the current price is: 20.25
the current price is: 20.25
the current price is: 20.25
the current price is: 20.25