Home > database >  I can't seem to find multiple values within the same class using BeatufulSoup
I can't seem to find multiple values within the same class using BeatufulSoup

Time:03-10

I am trying to learn how to scrape websites. Currently I am scraping: https://finance.yahoo.com/screener/unsaved/f491bcb6-de80-4813-b50e-d6dc8e2f5623?dependentField=sector&dependentValues=Consumer Cyclical&offset=0&count=100

I am trying to get the change, but it shares the same class as stock_price.

So, I tried using a different class: C($positiveColor) and C($negativeColor). But, when I use these classes I receive an error AttributeError: 'NoneType' object has no attribute 'text'.

This is because there are changes that are 0 and have no apparent class. How would I be able to get the 0 using BeatifulSoup?

Yes I know I could just test for None and then I could set it to 0, but I want to be able to do it using BeautifulSoup.

Thanks :)

import requests

headers = {'User-Agent'      : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36', 'Accept'          : 'text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8', 
 'Accept-Language' : 'en-US,en;q=0.5', 'DNT'             : '1', # Do Not Track Request Header 'Connection'      : 'close'
}


from bs4 import BeautifulSoup

URL = 'https://finance.yahoo.com/screener/unsaved/f491bcb6-de80-4813-b50e-d6dc8e2f5623?dependentField=sector&dependentValues=Consumer Cyclical&offset=0&count=100'
page = requests.get(URL, headers=headers, timeout=5)

soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id="screener-results")

stock_ = results.find_all("tr", class_="simpTblRow")

for stock_ in stock_:
  stock_symbol = stock_.find('a', class_='Fw(600) C($linkColor)')
  stock_name = stock_.find('td', class_='Va(m) Ta(start) Px(10px) Fz(s)')  
  stock_price = stock_.find('td', class_='Va(m) Ta(end) Pstart(20px) Fw(600) Fz(s)')
  stock_change = stock_.find('span', class_='C($positiveColor)')
  if stock_change == None:
    stock_change = stock_.find('span', class_='C($negativeColor)')
  
  print(stock_symbol.text.strip()   '\n'   stock_name.text.strip()   '\nCurrent Price: $'   stock_price.text.strip()   '\nChange: '   stock_change.text.strip(), end="\n"*2)

CodePudding user response:

In most cases it would be a better strategy to select your elements not by class cause often they are very dynamic, focus on more "static" attributes if available.

In case of the change, simply use data-field attribute of the <fin-streamer>

stock_change = stock_.find('fin-streamer', {'data-field':'regularMarketChange'})

Example

...
URL = 'https://finance.yahoo.com/screener/unsaved/f491bcb6-de80-4813-b50e-d6dc8e2f5623?dependentField=sector&dependentValues=Consumer Cyclical&offset=0&count=100'
page = requests.get(URL, headers=headers, timeout=5)    
soup = BeautifulSoup(page.content, "html.parser")    
results = soup.find(id="screener-results")

stock_ = results.find_all("tr", class_="simpTblRow")

for stock_ in stock_:
    stock_symbol = stock_.find('td', {'aria-label':'Name'})
    stock_name = stock_.find('td', {'aria-label':'Name'})
    stock_price = stock_.find('fin-streamer', {'data-field':'regularMarketPrice'})
    stock_change = stock_.find('fin-streamer', {'data-field':'regularMarketChange'})
    print(stock_symbol.text.strip()   '\n'   stock_name.text.strip()   '\nCurrent Price: $'   stock_price.text.strip()   '\nChange: '   stock_change.text.strip(), end="\n"*2)

Output

...
Great Wall Motor Company Limited
Great Wall Motor Company Limited
Current Price: $15.80
Change: -0.51

Mahindra & Mahindra Limited
Mahindra & Mahindra Limited
Current Price: $10.00
Change: 0.00

Sime Darby Berhad
Sime Darby Berhad
Current Price: $0.5690
Change:  0.0050
...
  • Related