How to fix error "NoneType' object has no attribute text"?-CodePudding

Here is the relevant portion of my code:

    def _get_data(self, html):
            doc = html.find('td', {'class': 'White'})
            doc_list = html.find_all('p', {'class': 'bib'})
            # Принято решение об отказе в регистрации (последнее изменение: 20.08.2020)
            text = ' '.join(doc.text.split())[28:]
            # ...

The whole code can be found here.

I needed to parse a site with documents and there is the limit on the website. After 4-5 document you can't see other and you have to wait. So i made a time limit but i started getting strange error

  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/yunus/Рабочий стол/RosPatentParser/service/parser.py", line 139, in parse
    self._get_data(soup)
  File "/home/yunus/Рабочий стол/RosPatentParser/service/parser.py", line 80, in _get_data
    text = ' '.join(doc.text.split())[28:]
AttributeError: 'NoneType' object has no attribute 'text'

CodePudding user response：

I would recommend adding the following between lines 76 and 77, just in your _get_data function:

with open('test.html', 'w') as f:
    f.write(html)

In order to be able to further debug this. At this point the error is telling you that doc is a NoneType object (meaning it is set to None). By glancing at your code I would expect

doc = html.find('td', {'class': 'White'})

to be returning None. BeautifulSoup objects return None when it can't find the element you're searching for. In this case it seems like your variable html is a BeautifulSoup object, and it can't find a td tag with a class White. Looking into the html should reveal why you're running into this problem and be a good starting point at fixing it.

CodePudding user response：

The simplest solution is to simply check for existence of the text value:

def _get_data(self, html):
    doc = html.find('td', {'class': 'White'})
    doc_list = html.find_all('p', {'class': 'bib'})
    # Принято решение об отказе в регистрации (последнее изменение: 20.08.2020)
    if doc.text is not None:  # Only do this if the <td> tag with the class 'White' was found.
        text = ' '.join(doc.text.split())[28:]
        ... # The rest of your code if you find the text
    else:
        # Handle the case where there are no <td> tags with the class 'White'

doc.text will be None if beautiful soup could not find a tag matching it.