Here is the relevant portion of my code:
def _get_data(self, html):
doc = html.find('td', {'class': 'White'})
doc_list = html.find_all('p', {'class': 'bib'})
# Принято решение об отказе в регистрации (последнее изменение: 20.08.2020)
text = ' '.join(doc.text.split())[28:]
# ...
The whole code can be found here.
I needed to parse a site with documents and there is the limit on the website. After 4-5 document you can't see other and you have to wait. So i made a time limit but i started getting strange error
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/yunus/Рабочий стол/RosPatentParser/service/parser.py", line 139, in parse
self._get_data(soup)
File "/home/yunus/Рабочий стол/RosPatentParser/service/parser.py", line 80, in _get_data
text = ' '.join(doc.text.split())[28:]
AttributeError: 'NoneType' object has no attribute 'text'
CodePudding user response:
I would recommend adding the following between lines 76 and 77, just in your _get_data
function:
with open('test.html', 'w') as f:
f.write(html)
In order to be able to further debug this. At this point the error is telling you that doc is a NoneType
object (meaning it is set to None
). By glancing at your code I would expect
doc = html.find('td', {'class': 'White'})
to be returning None
. BeautifulSoup objects return None
when it can't find the element you're searching for. In this case it seems like your variable html is a BeautifulSoup object, and it can't find a td tag with a class White. Looking into the html should reveal why you're running into this problem and be a good starting point at fixing it.
CodePudding user response:
The simplest solution is to simply check for existence of the text value:
def _get_data(self, html):
doc = html.find('td', {'class': 'White'})
doc_list = html.find_all('p', {'class': 'bib'})
# Принято решение об отказе в регистрации (последнее изменение: 20.08.2020)
if doc.text is not None: # Only do this if the <td> tag with the class 'White' was found.
text = ' '.join(doc.text.split())[28:]
... # The rest of your code if you find the text
else:
# Handle the case where there are no <td> tags with the class 'White'
doc.text
will be None
if beautiful soup could not find a tag matching it.