NoneType object is not callable error on a full dictionary, stackoverflow fatal error-CodePudding

Im trying to make a web-scraping script. I'm bumping into an error and cant seem to figure out why. I'm using spyder IDE, so all the variables are shown in variable explorer. My code is as follows

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup


root = "https://finviz.com/quote.ashx?t="


tickers = ['AMZN', 'GS']

news_tables = {}
for ticker in tickers:
    url = root   ticker
    req = Request(url=url, headers={'user-agent': 'dirty-30'})
    response = urlopen(req)
    #print(response)
    html = BeautifulSoup(response, 'html')
   # print(html)
    news_table = html.find(id='news-table')
    news_tables[ticker] = news_table
    

amzn_data = news_tables['AMZN']
amzn_rows = amzn_data.findALL('tr')
print(news_tables)

I get back error

TypeError: 'NoneType' object is not callable

Exception in comms call get_value:

  File "C:\Users\austi\Anaconda3\lib\site-packages\spyder_kernels\comms\commbase.py", line 347, in _handle_remote_call
    self._set_call_return_value(msg_dict, return_value)

  File "C:\Users\austi\Anaconda3\lib\site-packages\spyder_kernels\comms\commbase.py", line 384, in _set_call_return_value
    self._send_message('remote_call_reply', content=content, data=data,

  File "C:\Users\austi\Anaconda3\lib\site-packages\spyder_kernels\comms\frontendcomm.py", line 109, in _send_message
    return super(FrontendComm, self)._send_message(*args, **kwargs)

  File "C:\Users\austi\Anaconda3\lib\site-packages\spyder_kernels\comms\commbase.py", line 247, in _send_message
    buffers = [cloudpickle.dumps(

  File "C:\Users\austi\Anaconda3\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)

  File "C:\Users\austi\Anaconda3\lib\site-packages\cloudpickle\cloudpickle_fast.py", line 609, in dump
    raise pickle.PicklingError(msg) from e

_pickle.PicklingError: Could not pickle object as excessively deep recursion required.

I tried adding

sys.setrecursionlimit(30000000)

When I attempt to open the news_tables which is type=dict, I get a stack overflow message and the kernel restarts. What am I missing here? I think the nontype error stems from the stack overflow breaking the variable so it just deletes the dict causing an empty dict aka nonetype...

Why am I getting a stack overflow? This shouldn't be that much data? One scrape of one page? If my understanding of stack overflow is correct then all i can think of is there is somehow an infinite loop gathering the same data until i hit the pickling error ? I have several TB of mem on my system, have tons of RAM.

I'm perplexed any insights? I've restarted anaconda as a whole, my spyder is up to date.

Thanks.

CodePudding user response：

import requests
import pandas as pd
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0'
}


def main(url):
    with requests.Session() as req:
        req.headers.update(headers)
        allin = []
        for t in ['AMZN', 'GS']:
            params = {
                't': t
            }
            r = req.get(url, params=params)
            df = pd.read_html(r.content, attrs={'id': 'news-table'})[0]
            allin.append(df)
        df = pd.concat(allin, ignore_index=True)
        print(df)
        # df.to_csv('data.csv', index=False)


main('https://finviz.com/quote.ashx')

Output:

                     0                                                  1
0    Dec-10-22 01:30PM  Selling Your Home During the Holidays? 4 Moves...
1              12:21PM  15 Most Trusted Companies in the World Insider...
2              10:30AM  Target, Amazon and 4 More Retailers That Will ...
3              10:30AM  Opinion: These Will Be the 2 Largest Stocks by...
4              08:37AM       Better Buy: Microsoft vs. Amazon Motley Fool
..                 ...                                                ...
195            11:49AM  Goldman Sachs, Eager to Grow Cards Business, C...
196            10:53AM  Oil Slips as Swelling China Covid Cases Outwei...
197            08:42AM  Oil Dips Near $98 as Swelling China Covid Case...
198            08:41AM  Morgan Stanley funds have billions riding on a...
199            06:30AM  3 Goldman Sachs Mutual Funds Worth Betting On ...

[200 rows x 2 columns]

import requests
from bs4 import BeautifulSoup, SoupStrainer
from pprint import pp

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0'
}


def main(url):
    with requests.Session() as req:
        req.headers.update(headers)
        allin = {}
        for t in ['AMZN', 'GS']:
            params = {
                't': t
            }
            r = req.get(url, params=params)
            soup = BeautifulSoup(r.content, 'lxml', parse_only=SoupStrainer(
                'table', attrs={'id': 'news-table'}))
            allin[t] = soup
        pp(allin)


main('https://finviz.com/quote.ashx')

CodePudding user response：

Just replace

[...]
html = BeautifulSoup(response, 'html')
[...]
amzn_rows = amzn_data.findALL('tr')

with

[...]
html = BeautifulSoup(response, 'html.parser')
[...]
amzn_rows = amzn_data.find_all('tr')