Home > database >  Have made this scraper but the function returns no values? Just empty cells
Have made this scraper but the function returns no values? Just empty cells

Time:01-04

So I've made a webscraper and everything seems to be running fine, however, no values are being returned? Assuming there's something wrong with the url but I can't seem to spot anything.

import pandas as pd 
    import datetime
    import requests
    from requests.exceptions import ConnectionError
    from bs4 import BeautifulSoup
    
    def web_content_div(web_content, class_path):
        web_content_div = web_content.find_all('div', {'class': class_path})
        try:
            spans = web_content_div[0].find_all('span')
            texts = [span.get_text() for span in spans]
        except IndexError:
            texts = []
    
        return texts
    
    def real_time_price(stock_code):
        url= 'https://uk.finance.yahoo.com/quote/'   stock_code   '?p='   stock_code   '&.tsrc=fin-tre-srch'
        try:
        
            r = requests.get(url)
            web_content = BeautifulSoup(r.text, 'lxml')
            texts = web_content_div(web_content, 'My(6px) Pos(r) smartphone_Mt(6px) W(100%)')
            if texts != []:
              price, change = texts[0], texts[1]
            else:
              price, change = [], []
    
        except ConnectionError:
            price, change = [], []
    
        return price, change 
    
    Stock = ['BRK-B']
    print(real_time_price('BRK-B'))

CodePudding user response:

Try using user-agent in your header and separate your query params:

   ...
   url= 'https://uk.finance.yahoo.com/quote/'   stock_code
   params = {
        'p': stock_code,
        '.tsrc': 'fin-tre-srch',
    }
    headers = {'user-agent': 'my-app/0.0.1'}
    # alternatively: headers = {'user-agent': 'PostmanRuntime/7.28.4'}
    url = 'https://uk.finance.yahoo.com/quote/BRK-B'
    try:
        r = requests.get(url, params=params, headers=headers)
   ...

CodePudding user response:

There's nothing wrong with the URL, which you can easily check by running something like this from the command line (get curl for your OS if you don't have it):

curl --output result.txt "https://uk.finance.yahoo.com/quote/BRK-B?p=BRK-B&.tsrc=fin-tre-srch"

That works, and saves the text you're after in result.txt.

So, it's not the URL - usual suspect then is the user agent, and lo and behold, spoofing a normal web browser User Agent works just fine:

        headers = {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
        r = requests.get(url, headers=headers)

This is just some random user agent string, you could try to find something more generic, but the key thing here is that Yahoo doesn't want to serve your Python script and you'll have to lie to Yahoo about what you're really doing to get what you want (which you do at your own risk, I'm not saying you should, I'm just saying how it's possible - don't).

  •  Tags:  
  • Related