I am trying to learn how to scrape websites. Currently I am scraping: https://finance.yahoo.com/screener/unsaved/f491bcb6-de80-4813-b50e-d6dc8e2f5623?dependentField=sector&dependentValues=Consumer Cyclical&offset=0&count=100
I am trying to get the change, but it shares the same class as stock_price
.
So, I tried using a different class: C($positiveColor)
and C($negativeColor)
. But, when I use these classes I receive an error AttributeError: 'NoneType' object has no attribute 'text'
.
This is because there are changes that are 0 and have no apparent class. How would I be able to get the 0 using BeatifulSoup?
Yes I know I could just test for None and then I could set it to 0, but I want to be able to do it using BeautifulSoup.
Thanks :)
import requests
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36', 'Accept' : 'text/html,application/xhtml xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' : 'en-US,en;q=0.5', 'DNT' : '1', # Do Not Track Request Header 'Connection' : 'close'
}
from bs4 import BeautifulSoup
URL = 'https://finance.yahoo.com/screener/unsaved/f491bcb6-de80-4813-b50e-d6dc8e2f5623?dependentField=sector&dependentValues=Consumer Cyclical&offset=0&count=100'
page = requests.get(URL, headers=headers, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="screener-results")
stock_ = results.find_all("tr", class_="simpTblRow")
for stock_ in stock_:
stock_symbol = stock_.find('a', class_='Fw(600) C($linkColor)')
stock_name = stock_.find('td', class_='Va(m) Ta(start) Px(10px) Fz(s)')
stock_price = stock_.find('td', class_='Va(m) Ta(end) Pstart(20px) Fw(600) Fz(s)')
stock_change = stock_.find('span', class_='C($positiveColor)')
if stock_change == None:
stock_change = stock_.find('span', class_='C($negativeColor)')
print(stock_symbol.text.strip() '\n' stock_name.text.strip() '\nCurrent Price: $' stock_price.text.strip() '\nChange: ' stock_change.text.strip(), end="\n"*2)
CodePudding user response:
In most cases it would be a better strategy to select your elements not by class
cause often they are very dynamic, focus on more "static" attributes if available.
In case of the change, simply use data-field
attribute of the <fin-streamer>
stock_change = stock_.find('fin-streamer', {'data-field':'regularMarketChange'})
Example
...
URL = 'https://finance.yahoo.com/screener/unsaved/f491bcb6-de80-4813-b50e-d6dc8e2f5623?dependentField=sector&dependentValues=Consumer Cyclical&offset=0&count=100'
page = requests.get(URL, headers=headers, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="screener-results")
stock_ = results.find_all("tr", class_="simpTblRow")
for stock_ in stock_:
stock_symbol = stock_.find('td', {'aria-label':'Name'})
stock_name = stock_.find('td', {'aria-label':'Name'})
stock_price = stock_.find('fin-streamer', {'data-field':'regularMarketPrice'})
stock_change = stock_.find('fin-streamer', {'data-field':'regularMarketChange'})
print(stock_symbol.text.strip() '\n' stock_name.text.strip() '\nCurrent Price: $' stock_price.text.strip() '\nChange: ' stock_change.text.strip(), end="\n"*2)
Output
...
Great Wall Motor Company Limited
Great Wall Motor Company Limited
Current Price: $15.80
Change: -0.51
Mahindra & Mahindra Limited
Mahindra & Mahindra Limited
Current Price: $10.00
Change: 0.00
Sime Darby Berhad
Sime Darby Berhad
Current Price: $0.5690
Change: 0.0050
...