So I've made a webscraper and everything seems to be running fine, however, no values are being returned? Assuming there's something wrong with the url but I can't seem to spot anything.
import pandas as pd
import datetime
import requests
from requests.exceptions import ConnectionError
from bs4 import BeautifulSoup
def web_content_div(web_content, class_path):
web_content_div = web_content.find_all('div', {'class': class_path})
try:
spans = web_content_div[0].find_all('span')
texts = [span.get_text() for span in spans]
except IndexError:
texts = []
return texts
def real_time_price(stock_code):
url= 'https://uk.finance.yahoo.com/quote/' stock_code '?p=' stock_code '&.tsrc=fin-tre-srch'
try:
r = requests.get(url)
web_content = BeautifulSoup(r.text, 'lxml')
texts = web_content_div(web_content, 'My(6px) Pos(r) smartphone_Mt(6px) W(100%)')
if texts != []:
price, change = texts[0], texts[1]
else:
price, change = [], []
except ConnectionError:
price, change = [], []
return price, change
Stock = ['BRK-B']
print(real_time_price('BRK-B'))
CodePudding user response:
Try using user-agent
in your header and separate your query params
:
...
url= 'https://uk.finance.yahoo.com/quote/' stock_code
params = {
'p': stock_code,
'.tsrc': 'fin-tre-srch',
}
headers = {'user-agent': 'my-app/0.0.1'}
# alternatively: headers = {'user-agent': 'PostmanRuntime/7.28.4'}
url = 'https://uk.finance.yahoo.com/quote/BRK-B'
try:
r = requests.get(url, params=params, headers=headers)
...
CodePudding user response:
There's nothing wrong with the URL, which you can easily check by running something like this from the command line (get curl
for your OS if you don't have it):
curl --output result.txt "https://uk.finance.yahoo.com/quote/BRK-B?p=BRK-B&.tsrc=fin-tre-srch"
That works, and saves the text you're after in result.txt
.
So, it's not the URL - usual suspect then is the user agent, and lo and behold, spoofing a normal web browser User Agent works just fine:
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
r = requests.get(url, headers=headers)
This is just some random user agent string, you could try to find something more generic, but the key thing here is that Yahoo doesn't want to serve your Python script and you'll have to lie to Yahoo about what you're really doing to get what you want (which you do at your own risk, I'm not saying you should, I'm just saying how it's possible - don't).