I am new to web scraping and have started a project. I am scraping a website containing many pages. In some pages, the targeted data is not present while some have. Hence I tried to solve it with if- else condition to look whether the element is present or not. But it seems not working for some elements. Can I get explained what the issue is and how to get rid of it?
What I want to scrape is:
<div class="spec-list attributes-sexuality">
<h5 class="spec-subcat">Sexuality</h5>
<div class="col-split-xs-1 col-split-md-2">
<ul class="attribute-list copy-small">
<li class="">Bisexual</li>
<li class="">Gay</li>
<li class="">Lesbian</li>
</ul></div></div>
This element is not available in some pages. This is my try:
if soup.find('div', class_='spec-list attributes-sexuality'):
sexuality = ', '.join([x.get_text(strip = True) for x in soup.find('div', class_='spec-list attributes-sexuality').find('ul.attribute-list.copy-small>li')])
else:
sexuality = ''
Error I get :
'NoneType' object is not iterable
Why it returns None type when my loop must be ran if only the element is present? How can I overcome this?
CodePudding user response:
You should be doing like this.
s = """
<div class="spec-list attributes-sexuality">
<h5 class="spec-subcat">Sexuality</h5>
<div class="col-split-xs-1 col-split-md-2">
<ul class="attribute-list copy-small">
<li class="">Bisexual</li>
<li class="">Gay</li>
<li class="">Lesbian</li>
</ul></div></div>
"""
soup = bs.BeautifulSoup(s, 'lxml')
temp = soup.find('div', class_='spec-list attributes-sexuality')
if temp:
sexuality = ', '.join([x.get_text(strip = True) for x in temp.find('ul', class_='attribute-list copy-small').find_all('li')])
else:
sexuality = ''